0 votes

I'm new to dataiku DSS. However, I've worked with data for the last 10 years of my career.  Usually with SQL. 

I'm trying to figure out a work flow that makes it easy to Join / group / graph data sets in Dataiku.

In the project I'm working on I've finished my ETL phase and used Dataiku to extract XML data and load into PostgreSQL.  This had to be done in several passes through the data because the data has several different units of analysis, yet they all share a common key.  So far so good.  Now I want to move into exploratory data analysis.  I need to be able to look at the data on the screen at any of these units of analyses and say things like:

  • Let's join this set to that so that we can filter.  And then graph each of the variables individually.  Then I discover Oo that data has an interesting shape.  Now I want to add that minor insight to a dashboard.  Let's look at the next data element.

I'm finding this process awkward.  I'm wondering if I'm doing it wrong or have not found the path through the software that makes this easy.

Are there any videos or documents that suggest the best way to do this type of analysis within the context of Dataiku DSS?


1 Answer

0 votes
If all your datasets have a common key, and you are just plotting different variables on multiple charts, you could try the most straightforward approach:

- Use a Join recipe to Join all your datasets at once on that common key

- Open the resulting dataset and head to the Charts tab. You will find that charts have a "Filter" box in their options, letting you filter your data.

- Create as many charts as you need. Of course you can publish any of these on the dashboard.


However, I am not sure that I understand correctly what you are trying to achieve. If the approach I pointed out is not possible and you need more flexibility, you could create a SQL Query dataset, and edit it until you found what you are looking for. Then if you want to keep it, create a SQL Query recipe, in order to save the results. The steps would look like this:

- Create a new dataset on your SQL database (it will appear in your flow as a single dataset, blue square). At creation screen, select Mode: "SQL query" and type a query

- On the dataset, create a chart. If you are not satisfied, go to settings tab, edit the query, and update charts.

- Once satisfied, copy the query to a SQL Query recipe (this one will appear in the flow as a code recipe, orange circle, outputting a dataset), this will create a managed dataset on which you can recreate a chart, and publish it.

- Rinse and repeat


Please don't hesitate to comment on that answer if you need clarification or if I missed something in your question!
Thanks for the feedback.  Some things to try there.  

From a clarification point of view.  Although the various records share a common top level key.  Many of them have different units of analysis when we get to the details.  The data I'm looking at come form a support chat type situation.  One master support records number. But each statement from each side of the conversation get's it's own record. Different data elements from each side of the conversation.  The count of records coming from each side of the conversation can vary.  The difference between these records is the source and the time offset.  So I can not just join these together for one long row without somehow aggregate the utterances from each side of the conversation.  This did not seem to be dooable from the gui portion of the product.  So I've started to work with a PostgreSQL server behind dataiku.  

I have started to go down the road of a SQL dataset.  Have not had time to fully investigate this.  Other items on the to do list.

One of the challenges I've noticed is that there are some nice histograms with a nice horizontal whisker chart in some for the analysis screens.  I'm not finding a way to reproduce these using the standard chart tool and push this to the dashboard.
1,082 questions
1,126 answers
10,889 users

┬ęDataiku 2012-2018 - Privacy Policy