0 votes

I have a dataframe in my DSS workflow which I want to change and store in a non-csv file within a folder.


Assume my dataframe is called df and for the example you can recreate is as follows

df = pd.DataFrame({"a": [1,2,3,4,5], "b": [6,7,8,9,10], "c": [11,12,13,14,15]})

I now want to add a few lines of comment above the dataframe and then save the file automatically in a folder.

Firstly, I have taken my dataset and load it into a folder ("my_input_folder") with the DSS recipe "Export to folder" calling the file df.csv. Then I have added a python script which reads the file, adds the comments and output it in another folder ("my_output_folder"). The code is below but it didn't get what I wanted. Could you please help?


# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
import os.path

# Recipe inputs

folder_path = dataiku.Folder("my_input_folder").get_path()
path_of_csv = os.path.join(folder_path, "df.csv") 

# Recipe outputs
output2 = dataiku.Folder("my_output_folder")
output2_path = output2.get_path()

completeName = os.path.join(folder_path,  "df.csv")         

file1 = open(completeName, "w")

toFile = raw_input("# This is my first comment\n This is my other comment \n") # I need to write two comments on two different rows



dirPath2 = os.path.join(output2_path,file1)


Thank you!



1 Answer

0 votes

What is the expected format and content of your output? Could you give us an example in text format? We would like to better understand the goal of adding comments to a csv file.

From our understanding, a possibility for you could be to:

- use a python recipe reading the original csv as a pandas dataframe,

- add your comments inside the dataframe using pandas merge or append method

- write your dataframe to file using one of pandas methods: https://pandas.pydata.org/pandas-docs/stable/api.html#id12


Hi Alex,
thanks for your reply.
I think what you advised is similar to what I did ( the python code above was used to read the file and add the comment).
However once I add the comment to my df, the format is not compatible anymore to a DSS dataframe, hence why I was trying to use the managed folders.

Essentially having my df :

        a    b    c
0    1    6    11
1    2    7    12
2    3    8    13
3    4    9    14
4    5    10    15

I want to write some comments (this is necessary to match some file format requirement I am given). So the output of python would be like below:

# comment
# comment2

       a    b    c
0    1    6    11
1    2    7    12
2    3    8    13
3    4    9    14
4    5    10    15

Now, the above can not be stored in DSS because it does not respect the row-column DF format, so I want to store this file in a managed folder, with extension .tab.  

Let me know if it is not clear.
Thanks for the clarification. This is more a Python-related question than to DSS. You can have a look at solutions like https://stackoverflow.com/questions/5914627/prepend-line-to-beginning-of-a-file. This way you can:
- write csv to file using pandas
- prepend your comments to the csv text file
Thanks. Can you please just let me know if DSS folders allow any type of file format? I.e. can a DSS folder allow for writing .tab files in it or does it allow only to contain csv files?
Yes, a "DSS" folder is just a regular filesystem folder where you can store anything you want.
1,322 questions
1,341 answers
11,889 users

©Dataiku 2012-2018 - Privacy Policy