0 votes

I am currently looking at some data (CSV source) with some explanatory columns and columns named 2001_01 to 2015_08. Each row can be identified by a unique identifier (eg: FOO01). The data will have seasonal dependencies and so I am trying to analyze the data year over year.

What would be the proper dataiku way to do this?

For instance, I would like to be able to select one row, plot the data on the Y axis per year and use the months on the X axis per month.

Then, I'll compare data sets: say divide row FOO1 by BAR2 and plot it in the same manner.

What would be the most efficient way to do this?
asked by

1 Answer

0 votes

I would first reshape the data so that you have a column year and a column month. The Fold Multiple Columns processor might be helpful: http://doc.dataiku.com/dss/latest/preparation/reshaping.html#fold-multiple-columns

The data might even have been shaped like this before being transformed into columns 2001_01 to 2015_08.

Then it's straightforward to plot one column (year) agains another (month).
answered by
792 questions
816 answers
533 users