Coming soon: We’re working on a brand new, revamped Community experience. Want to receive updates? Sign up now!

0 votes
I have two questions around the chart engine:

- In the documentation it says "This allows you to perform visual analytics on very large data extracts". Can this be quantified in terms of GB or number of rows? Would it be unrealistic to aggregate low cardinality columns in a data set  of 100M+ rows. Columnar compression should be able to handle this.

- If the underlying dataset changes in the source system, how can I make sure that the data stays in sync between source and DSS server?

Many thanks

retagged by

1 Answer

0 votes
Hi Uli,

The practical limitations of the builtin charts engine would rather be based on the time it takes to actually build the columnar cache, the required disk space for it, and the cardinality of the columns (it does not scale very on very-high cardinality columns)

100M rows and low-cardinality columns should definitely be OK.

The cache is automatically dropped when running on managed datasets (ie, datasets built by DSS). For source datasets, we have chosen not to try and autodetect changes in the underlying source, because it would be too expensive, so if the data source changes on an external dataset, you have to click on the "Refresh sample" button.
1,337 questions
1,364 answers
11,916 users

©Dataiku 2012-2018 - Privacy Policy