Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi everyone!
I want to see your input on the best approach to do the following:
I have daily CSV files that contain no date column, but values change on a daily basis. I want to create a dataset based on these files. I will be added files every day. I need to add a date column to the file of the date it was added.
I created a SharePoint folder where Dataiku will pull the files.
I was able to get DataIku to add a date column using expression recipe with now() expression - This will add todays date. But then tomorrow it will add tomorrow's date and override the previous date.
Can I create a python recipe to do this or any suggestions?
Thanks,
Operating system used: Windows
Use the Files in Folder dataset and the File_Name to get your date:
https://community.dataiku.com/t5/Using-Dataiku/Using-the-quot-Files-in-folder-quot-dataset/m-p/33214
Yes, You can create a python recipe to do almost anything you want to do. You can do that from a python recipe node that looks at a folder rather than a CSV or database based data set.
You say that "I will be add[ing] files every day." You don't say if these files will override the existing files OR if the files will be added to the same directory along side the files that are already there. You also don't say if you have any control of the creation of the original files. For example you could create the files with names that reflected the current date. Or you could look at the time stamp from the files in the file system and use those dates. You also did not mention anything about the relitive size of these files.
That said, depending on what exactly you are trying to do you might find one or more of these things helpful.
I'm sure that there are a bunch of other options that might be helpful. But, I'm not clear enough about your use case to provide a more specific set of suggestions.
Thanks for your fast response.
To give you a more detail as follow:
1 - files will be added to the same directory alongside the files that are already there.
2 - I download the file from a 3rd party source and I rename it with the current date Example = "FILENAME_02272024" (Feb 02, 2024).
3 - Each file contains average of 1500 rows and 25 columns - size 202kb
In other scenarios I have created datasets with this type of structure - where the dataset is created from all the files within a directory. The difference is that those files have a date column.
Please advise.
Thanks!
Use the Files in Folder dataset and the File_Name to get your date:
https://community.dataiku.com/t5/Using-Dataiku/Using-the-quot-Files-in-folder-quot-dataset/m-p/33214
@Turribeach Thanks for sharing!