Load multiple dataset at the same time

ele_f · ‎11-07-2017

Hi,

I need to load multiple dataset from my local pc to a dss project.

Is there a way to load al the file together and assign to each of them a different dataset name?

At the moment if multiple files are loaded in dss (and have the same schema) they are union together into a unique file.

This functionality would really streamline the data import process, if available in dss.

Thanks

matthias_funke · ‎12-13-2017

I agree this would be useful feature! Most people solve it using python. You only need a few lines of code, essentially a for loop which reads in the files one by one into a pd.DataFrame(), appends a column with the file name, and finally appends that DataFrame to the output dataframe, which of course you save back to dataiku.


out = pd.DataFrame(columns=columns)
for i, data in enumerate(useful_station):
    print i,
    try:
        with open(path+data) as f:
            lines=f.readlines()
        a = [l.replace("\n","").split() for l in lines]
        d = pd.DataFrame(a, columns=columns)
        d[["Month","Day","Hour"]] = d[["Month","Day","Hour"]].astype(int).astype(str)
        d["key"] = data+"-"+d["Month"]+"-"+d["Day"]+"-"+d["Hour"] # new column with file name!
        if i:
            out = pd.concat((out,d))
        else:
            out = d
    except IOError:
        print 
        print "********"

The code sample above was built for something similar, you can use it as inspiration, but not just run it.

bcb · ‎11-08-2019

@matthias.funke , after creating the DataFrame with filenames in a new column, how would one create multiple datasets from it?

Load multiple dataset at the same time

Load multiple dataset at the same time

Labels

Datasets

Sign up to take part

Load multiple dataset at the same time

Load multiple dataset at the same time

Labels

Datasets