0 votes

I tried to sync a PostgreSQL table in a dataset but columns of type `array` are converted to type `string`.
I need to keep those columns as `array` to be able to use functions such as `unnest` or `string_agg` in a further recipe.
If I change the schema manually in the tab type of the dataset, I get this error when running the job:

Can't handle column type ARRAY in SQL

I am using Dataiku DSS version 3.1.0 with a PostgreSQL 9.4 database

Does anyone have an idea?

asked by
Why not push the unnest and string_agg down to the DB level.

https://www.postgresql.org/docs/9.6/static/functions-array.html

If you are doing aggregation you can unnest the array, sum, all with a single SQL or WITH statement.
PostgreSQL is great with arrays, no need to pull out the data if you can do it within PostgreSQL.
Thanks for your answer!
This is actually what I'm currently doing, but the set of data has grown so much that it makes my query time out.
That's why I wanted to delay this process in a separate recipe.

1 Answer

0 votes
Hi,

This is not directly possible in DSS which doesn't support at all the array type of PostgreSQL. However, if you let DSS store the array as string, it will store it as JSON text. PostgreSQL then provides many functions to handle JSON texts, treat them as arrays, extract elements, ...: https://www.postgresql.org/docs/9.4/static/functions-json.html
answered by
Thanks, I'll dig into that solution.
944 questions
978 answers
989 comments
2,153 users

©Dataiku 2012-2018 - Privacy Policy