Q & A
Governance & Security
Learn Dataiku DSS
Q & A
Ask a Question
Email or Username
I forgot my password
Is there a maximum length for the dynamic SQL generated for moving data into a SQL based table?
I am writing a python recipe to join two csv datasets and write to a postgres table.
Here is the first error:
An invalid argument has been encountered : Column name not matching between query and schema (Total Randomized or Receiving Treatment-United States of Americ, expected Total Randomized or Receiving Treatment-United States of America) SELECT * FROM "CLINICALTRIALCOSTPROJECTION_sitespatientsvisitsbyregion_t" LIMIT 10000. Please check dataset schema.
The length of the column ("Total Randomized or Receiving Treatment-United States of America") is 64
Here is another error after the script runs when I try to explore the data (I get a similar error to the one above) and then go to created tabe settings:
Connection OK. Table exists, but schema does not match : Name mismatch for column 39 : '1st Patient Recruited _FPI_' in dataset, '1st Patient Recruited (FPI)' in table
So the python recipe ran and created the table in Postgres. It is a populated table but I cannot import it because of parenthesis in the table column names:
An invalid argument has been encountered : Column name not matching between query and schema (1st Patient Recruited (FPI), expected 1st Patient Recruited _FPI_) SELECT * FROM "CLINICALTRIALCOSTPROJECTION_jointed_t" LIMIT 10000. Please check dataset schema.
to add a comment.
to answer this question.
The problem is very likely to come from your database.
Assuming you are using Postgres, there is a default length for labels.
It works in such a way that you can write with a long string for this label, but Postgres by default will truncate it as per the documentation
DSS did write his schema with the label you provided when creating the output dataset, while Postgres truncated it, that is why after the recipe completes and DSS tries to synchronize his schema (with the accurate label) with the Postgres schema (with the truncated label), it gives an error.
As per Postgres documentation you can raise this limit in the Postgres config
The system uses no more than NAMEDATALEN-1 bytes of an identifier; longer names can be written in commands, but they will be truncated. By default, NAMEDATALEN is 64 so the maximum identifier length is 63 bytes. If this limit is problematic, it can be raised by changing the NAMEDATALEN constant in src/include/pg_config_manual.h.
Cheers from DKU
ask related question
Thanks @kgdku. What about my follow up comment?
What method do you use to write the SQL table?
Hi @Alex. I had two uploaded csv files and I was using a python script recipe to join them and write them to a postgres table. I used pgAdmin to check the structure of the table in postgres and it was fine. I was not able to import the table as a new dataset.
Can you paste the part of the python code where you write the data back?
Here you go:
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Read recipe inputs
globalCombined = dataiku.Dataset("GlobalCombined")
globalCombined_df = globalCombined.get_dataframe()
sitesPatientsVisitsbyRegion = dataiku.Dataset("SitesPatientsVisitsbyRegion")
sitesPatientsVisitsbyRegion_df = sitesPatientsVisitsbyRegion.get_dataframe()
joined_python_df = globalCombined_df.set_index('OPPORTUNITY_NUMBER__C', drop=False).join(sitesPatientsVisitsbyRegion_df.set_index("OPPORTUNITY_NUMBER__C"), how='left')
# Write recipe outputs
joined_python = dataiku.Dataset("joined_python")
Hi Pouya, try using dropAndCreate=True as parameter of
Note that doing a join in-memory using Python is not very efficient. Can you use a Join recipe instead to leverage in-database computation?
Hi, did you solve your issue?
to add a comment.
Most popular tags
Keep a column as ARRAY when synchronizing a PostgreSQL table into a dataset
Is there a work-around for updating Pandas for use in a python code environment?
How to import the seaborn python library (based on matplotlib) in a RMarkdown report?
Python recipe picks the dataset name automatically for SQL Query
Error on import for a SQL Query in a python recipe
Welcome to Dataiku Answers, where you can ask questions and receive answers from other members of the community.
©Dataiku 2012-2018 -