0 votes

I tried to sync a PostgreSQL table in a dataset but columns of type `array` are converted to type `string`.
I need to keep those columns as `array` to be able to use functions such as `unnest` or `string_agg` in a further recipe.
If I change the schema manually in the tab type of the dataset, I get this error when running the job:

Can't handle column type ARRAY in SQL

I am using Dataiku DSS version 3.1.0 with a PostgreSQL 9.4 database

Does anyone have an idea?

by
Why not push the unnest and string_agg down to the DB level.

https://www.postgresql.org/docs/9.6/static/functions-array.html

If you are doing aggregation you can unnest the array, sum, all with a single SQL or WITH statement.
PostgreSQL is great with arrays, no need to pull out the data if you can do it within PostgreSQL.
Thanks for your answer!
This is actually what I'm currently doing, but the set of data has grown so much that it makes my query time out.
That's why I wanted to delay this process in a separate recipe.

1 Answer

0 votes
Hi,

This is not directly possible in DSS which doesn't support at all the array type of PostgreSQL. However, if you let DSS store the array as string, it will store it as JSON text. PostgreSQL then provides many functions to handle JSON texts, treat them as arrays, extract elements, ...: https://www.postgresql.org/docs/9.4/static/functions-json.html
by
Thanks, I'll dig into that solution.
1,082 questions
1,126 answers
1,250 comments
10,889 users

©Dataiku 2012-2018 - Privacy Policy