Creating a data set from scratch in Python

bfilippi · ‎10-16-2015

Hi

I would like to create a data set in a notebook out of financial data I am pulling out of a web service : Quandl.

They provide an API which allows me to download data in a dataframe

import Quandl as qd

import dataiku

import pandas as pd, numpy as np

from dataiku import pandasutils as pdu

df = qd.get("GOOG/NASDAQ_GOOGL")

df.index

Out[20]:


DatetimeIndex(['2004-08-19', '2004-08-20', '2004-08-23', '2004-08-24', '2004-08-25', '2004-08-26', '2004-08-27', '2004-08-30', '2004-08-31', '2004-09-01', 
               ...
               '2015-09-24', '2015-09-25', '2015-09-28', '2015-09-29', '2015-09-30', '2015-10-01', '2015-10-02', '2015-10-12', '2015-10-13', '2015-10-14'], dtype='datetime64[ns]', name=u'Date', length=2803, freq=None, tz=None)

In [21]:


df.columns

Out[21]:


Index([u'Open', u'High', u'Low', u'Close', u'Volume'], dtype='object')

df is a dataframe.

I guess it messes up as the schema isn't initiated properly

I try to run:

fdc = dataiku.Dataset("qdl")

fdc.write_schema_from_dataframe(df)

fdc.write_with_schema(df)

and it fails with:


Unable to fetch schema for %s : %s'%(self.name,err_msg)

Hence my question:

What is is the recipe/optimal way to create a dataset in python from a pandas dataframe

Thanks!

jrouquie · ‎10-19-2015

Hi,

This works for me:

from dataiku import Dataset
import Quandl as qd
g = qd.get("GOOG/NASDAQ_GOOGL")
g["date"] = g.index
Dataset("qdl").write_with_schema(g)

This is an easy way to initialize the schema (instead of defining it by hand). Once the schema is initialized, I can replace write_with_schema by write_from_dataframe.

Creating a data set from scratch in Python

Creating a data set from scratch in Python

Labels

code

Datasets

Python

Sign up to take part

Creating a data set from scratch in Python

Creating a data set from scratch in Python

Labels

code

Datasets

Python