0 votes

Can't figure out how to do this. Have tried two approaches, neither worked.

1. Approach 1: Pandas way: data.to_csv('data.csv')

This does not throw an error, but I don't see the dataset anywhere in my flow...

2. Approach 2: Dataiku way: # Recipe outputs
recommenderdata_tosave = dataiku.Dataset("recommenderdata_tosave")

recommenderdata_tosave.write_with_schema(data_for_recommender)

In the notebook, I get this error: 

Exception: None: dataset does not exist: PROJECT.recommenderdata_tosave

This code works in the python recipe in the flow, but not in the notebook for some reason.

Any help would be appreciated.

asked by

1 Answer

+1 vote
Hi,

"write_with_schema" does not create the dataset, it only "fills" it. You need to first declare the dataset in your Flow. The best way to do it is to create it as a "managed" dataset, so that DSS handles all the connection details: in the Flow, click on "+ Dataset" > "Internal" > "Managed dataset". You now only need to enter the name, and select where you want this dataset to be stored. You can then use it in the notebook.
answered by
Thanks for your quick and helpful answer. I confirm that this works!

To anyone who comes after me, this is what I did:

1. Create managed data set as explained above (I named it: data_for_recommender_managed)

2. save it from the notebook with the following code:

# Recipe outputs
data_for_recommender_managed = dataiku.Dataset("data_for_recommender_managed")
# the dataframe in memory in the notebook is called: data_for_recommender
data_for_recommender_managed.write_with_schema(data_for_recommender)
710 questions
729 answers
559 comments
461 users