Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi all,
I wrote a python code using python recipe, I need to add inside it a command that achieves:
The result of my code is a dataframe named combined_df, I need to insert the data in it into another table that's connected to SQL server.
Also this SQL table is an input to the python code, is this doable?
The idea is I'm considering the SQL table as original data, then when new data coming in I'm comparing both, and I need the updates to be inserted to the SQL table (original).
Hi @Arwamus0 ,
You can write to SQL table by providing output dataset in DSS.
If you are both reading/writing to the that table this possible but you may want to ensure you set append mode.
You can see the answer from : https://community.dataiku.com/t5/General-Discussion/Append-Data-without-drop-table-with-python/m-p/1...
Thanks
Hi @AlexT
When there is new data, the old data column(ValidTo) has to be populated and this is done using the python code, if I append the new data this won't happen.
I need the table to be overwritten with my new df.
Also I got an error saying 'Dataset' object has no attribute 'spec_item'
Thanks
Hi @Arwamus0 ,
If you are consolidating and have the final df you want to write code, you can write simply.
test = dataiku.Dataset("test")
test.write_with_schema(pandas_dataframe)
THanks
Hi @AlexT
It worked thanks a lot..
but now I have a problem it's duplicating the data inside the table, is there a command I can write before writing into the table to truncate it first?
you can use drop_duplicates on your dataframe
https://pandas.pydata.org/docs/reference/api/pandas.DataFrame.drop_duplicates.html
the duplicates are happening in SQL table not in the df
Did you remove the append spec part?
yes it gives an error 'Dataset' object has no attribute 'spec_item'
I can't see a reason the data would be duplicated the write_with_schema should drop the table and then insert, did you include append mode in the recipe - Input/Output tab?
If not please open a support ticket with the job diagnostics so we can investigate.
Thanks
Hi @Arwamus0, could you please post your Python script, or at least the parts that deal with Dataset input/output?
If you're just trying to overwrite the output Dataset with your dataframe you need to make sure the Append button in the Input/Output tab of your Python recipe is turned off.