Coming soon: We’re working on a brand new, revamped Community experience. Want to receive updates? Sign up now!

0 votes

I tried to sync a PostgreSQL table in a dataset but columns of type `array` are converted to type `string`.
I need to keep those columns as `array` to be able to use functions such as `unnest` or `string_agg` in a further recipe.
If I change the schema manually in the tab type of the dataset, I get this error when running the job:

Can't handle column type ARRAY in SQL

I am using Dataiku DSS version 3.1.0 with a PostgreSQL 9.4 database

Does anyone have an idea?

Why not push the unnest and string_agg down to the DB level.

If you are doing aggregation you can unnest the array, sum, all with a single SQL or WITH statement.
PostgreSQL is great with arrays, no need to pull out the data if you can do it within PostgreSQL.
Thanks for your answer!
This is actually what I'm currently doing, but the set of data has grown so much that it makes my query time out.
That's why I wanted to delay this process in a separate recipe.

1 Answer

0 votes

This is not directly possible in DSS which doesn't support at all the array type of PostgreSQL. However, if you let DSS store the array as string, it will store it as JSON text. PostgreSQL then provides many functions to handle JSON texts, treat them as arrays, extract elements, ...:
Thanks, I'll dig into that solution.
1,339 questions
1,365 answers
11,916 users

©Dataiku 2012-2018 - Privacy Policy