Coming soon: We’re working on a brand new, revamped Community experience. Want to receive updates? Sign up now!

+1 vote
use case: you want to sync your local file with Dataiku dataset.
by

1 Answer

+1 vote

Hi Frank,

You should do a Python Custom Recipe in a plugin or scenario (I use to do it in scenario), with something similar to:

import dataiku
from dataikuapi import SyncRecipeCreator
from dataiku.scenario import Scenario
import pandas as pd

folder_path = "path/to/file/" # Don't forget the last /
file = "file.csv"

df = pd.read_csv(folder_path + file) # you should adapt the parameters

dataset = project.create_dataset(dataset_name, 'Filesystem', params={'connection': 'filesystem_root', 'path': folder_path + file}, formatType='csv', formatParams={'separator': ';', 'style': 'no_escape_no_quote', 'parseHeaderRow': True}) # here too

dataset.set_schema({'columns': [{'name': column, 'type': 'string'} for column in df.columns]}) # I use to set string and then change it

builder = SyncRecipeCreator("sync_output_dataset", project)
builder = builder.with_input(dataset_name)
builder = builder.with_output("output_dataset", append=False)
recipe = builder.build()

scenario.build_dataset("output_dataset", build_mode='NON_RECURSIVE_FORCED_BUILD')

 

by
Hi Alan,
Thanks for the reply! Appreciate it!   
when I run this "df = pd.read_csv(folder_path + file)"  
I got this error: file does not exist
I think  it actually tries to read from the DSS server's drive not my local file.
Any thoughts?
Thanks,
Frank
Hi,

Yes you have to upload the file to DSS server (if you are using the REST API, you have to do it too :S
1,337 questions
1,362 answers
1,555 comments
11,912 users

©Dataiku 2012-2018 - Privacy Policy