0 votes

Hi, 
Few months ago, I used the function 

create_dataset(name)

But know I am asked to use 

create_dataset(dataset_name, type)

 

The "type" parameter is new and mandatory. But there is no explanation of what it is exactly. Anyone have an idea ? 

    project = self.automation.client.get_project('<PROJECT>')
    project.create_dataset(dataset_name = '<dataset>', type='Filesystem')

Even if I put a value to this parameter, I still have this error message : 

TypeError: create_dataset() missing 1 required positional argument: 'type'



Thanks ! 

 

by
edited by

1 Answer

0 votes

Hi,

The argument is not called "name" but "dataset_name", so you can use either:

create_dataset("administration", "Filesystem")

or

create_dataset(dataset_name="administration", type="Filesystem")
by
Sorry but I tried the first solution. And the second changed nothing, the problem is still with the "type" parameter which is "missing".
I'm sorry, we can't reproduce your issue. Here is a sample code to generate a filesystem dataset from scratch:

dataset = project.create_dataset("my-fs-dataset", "Filesystem")

definition = dataset.get_definition()
definition["params"]["connection"] = "filesystem_root"
definition["params"]["path"] = "/home/centos/titanic/kaggle_titanic_train.csv"
definition["formatType"] = "csv"
definition["formatParams"] = {"separator": ",", "style": "excel", "parseHeaderRow": True }
definition["schema"] = { "columns" : [{"name": "PassengerId", "type":"string"}, {"name": "Survived", "type": "int"}]}
dataset.set_definition(definition)
First, thank you for your answers !
Then here is the code I wrote :


file_path= '/'.join([project.project_key,
                       "_V"+str(project.get_variables()['standard']['version']),
                       "administration"])
format_params = {
    'separator': ',',
    'style': 'unix',
    'parseHeaderRow': True
}

try:
    dataset = dataiku.Dataset('administration')
    df_dataset = dataset.get_dataframe()

except:
    project = design.client.get_project('ADMINISTRATION')
    project.create_dataset('administration',
                           'Filesystem',
                           params={
                               "connection": "filesystem_managed",
                               "path": file_path
                           },
                           formatType='csv',
                           formatParams=format_params
                          )

df = pd.concat([
    design.table,
    automation.table], sort=False)
        
dataset = dataiku.Dataset('administration')
dataset.write_with_schema(df)

The problem is the function write_with_schema(df). Here is the exception I have :
Exception: None: b'Internal error, caused by: NullPointerException: null.
I think there is something arount dropAndCreate parameter...
1,298 questions
1,326 answers
1,507 comments
11,863 users

©Dataiku 2012-2018 - Privacy Policy