Output tab file to managed folder in DSS

ele_f
Level 2
Output tab file to managed folder in DSS

Hi,

I have a dataframe in my DSS workflow which I want to change and store in a non-csv file within a folder.



 



Assume my dataframe is called df and for the example you can recreate is as follows



df = pd.DataFrame({"a": [1,2,3,4,5], "b": [6,7,8,9,10], "c": [11,12,13,14,15]})



I now want to add a few lines of comment above the dataframe and then save the file automatically in a folder.



Firstly, I have taken my dataset and load it into a folder ("my_input_folder") with the DSS recipe "Export to folder" calling the file df.csv. Then I have added a python script which reads the file, adds the comments and output it in another folder ("my_output_folder"). The code is below but it didn't get what I wanted. Could you please help?



 



# -*- coding: utf-8 -*-

import dataiku

import pandas as pd, numpy as np

from dataiku import pandasutils as pdu

import os.path



# Recipe inputs



folder_path = dataiku.Folder("my_input_folder").get_path()

path_of_csv = os.path.join(folder_path, "df.csv") 



    

# Recipe outputs

output2 = dataiku.Folder("my_output_folder")

output2_path = output2.get_path()





completeName = os.path.join(folder_path,  "df.csv")         



file1 = open(completeName, "w")



toFile = raw_input("# This is my first comment\n This is my other comment \n") # I need to write two comments on two different rows



file1.write(toFile)



file1.close()



dirPath2 = os.path.join(output2_path,file1)



 



Thank you!



 

0 Kudos
5 Replies
Alex_Combessie
Dataiker Alumni
Hello,

What is the expected format and content of your output? Could you give us an example in text format? We would like to better understand the goal of adding comments to a csv file.

From our understanding, a possibility for you could be to:

- use a python recipe reading the original csv as a pandas dataframe,

- add your comments inside the dataframe using pandas merge or append method

- write your dataframe to file using one of pandas methods: https://pandas.pydata.org/pandas-docs/stable/api.html#id12

Cheers,

Alex
0 Kudos
ele_f
Level 2
Author
Hi Alex,
thanks for your reply.
I think what you advised is similar to what I did ( the python code above was used to read the file and add the comment).
However once I add the comment to my df, the format is not compatible anymore to a DSS dataframe, hence why I was trying to use the managed folders.

Essentially having my df :

a b c
0 1 6 11
1 2 7 12
2 3 8 13
3 4 9 14
4 5 10 15

I want to write some comments (this is necessary to match some file format requirement I am given). So the output of python would be like below:

# comment
# comment2

a b c
0 1 6 11
1 2 7 12
2 3 8 13
3 4 9 14
4 5 10 15

Now, the above can not be stored in DSS because it does not respect the row-column DF format, so I want to store this file in a managed folder, with extension .tab.

Let me know if it is not clear.
0 Kudos
Alex_Combessie
Dataiker Alumni
Hi,
Thanks for the clarification. This is more a Python-related question than to DSS. You can have a look at solutions like https://stackoverflow.com/questions/5914627/prepend-line-to-beginning-of-a-file. This way you can:
- write csv to file using pandas
- prepend your comments to the csv text file
0 Kudos
ele_f
Level 2
Author
Thanks. Can you please just let me know if DSS folders allow any type of file format? I.e. can a DSS folder allow for writing .tab files in it or does it allow only to contain csv files?
0 Kudos
Alex_Combessie
Dataiker Alumni
Yes, a "DSS" folder is just a regular filesystem folder where you can store anything you want.
0 Kudos

Labels

?
Labels (2)
A banner prompting to get Dataiku