0 votes

Hi 

I would like to create a data set in a notebook out of financial data I am pulling out of a web service : Quandl.

They provide an API which allows me to download data in a dataframe

import Quandl as qd
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

df = qd.get("GOOG/NASDAQ_GOOGL")

df.index

Out[20]:

DatetimeIndex(['2004-08-19', '2004-08-20', '2004-08-23', '2004-08-24', '2004-08-25', '2004-08-26', '2004-08-27', '2004-08-30', '2004-08-31', '2004-09-01', 
               ...
               '2015-09-24', '2015-09-25', '2015-09-28', '2015-09-29', '2015-09-30', '2015-10-01', '2015-10-02', '2015-10-12', '2015-10-13', '2015-10-14'], dtype='datetime64[ns]', name=u'Date', length=2803, freq=None, tz=None)

In [21]:

df.columns

Out[21]:

Index([u'Open', u'High', u'Low', u'Close', u'Volume'], dtype='object')

df is a dataframe.

I guess it messes up as the schema isn't initiated properly

I try to run:

 fdc = dataiku.Dataset("qdl")
fdc.write_schema_from_dataframe(df)
fdc.write_with_schema(df)

and it fails with:

Unable to fetch schema for %s : %s'%(self.name,err_msg)

Hence my question:

What is is the recipe/optimal way to create a dataset in python from a pandas dataframe

 

Thanks!

 

by
Which version are you using?
I pulled the latest version from docker dataiku/dss version 2.1.0
I installed a docker version on my mac at home and I got the following error:
import Quandl as qd
df = qd.get("GOOG/NASDAQ_GOOGL")
 fdc = dataiku.Dataset("qdl")
fdc.write_from_dataframe(df)
‚Äč
ERROR:root:Exception caught while writing
Traceback (most recent call last):
  File "/home/dataiku/dataiku-dss-2.1.0/python/dataiku/core/dataset_write.py", line 233, in run
    self.streaming_api.wait_write_session(self.session_id)
  File "/home/dataiku/dataiku-dss-2.1.0/python/dataiku/core/dataset_write.py", line 195, in wait_write_session
    msg = 'An error occurred during dataset write (%s): %s'%(id.encode("utf8"), decoded_resp["message"].encode('utf-8'))
AttributeError: 'NoneType' object has no attribute 'encode'
WARNING:requests.packages.urllib3.connectionpool:Connection pool is full, discarding connection: 127.0.0.1
---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-9-2a9698bf7f93> in <module>()
      1 fdc = dataiku.Dataset("qdl")
----> 2 fdc.write_from_dataframe(df)

/home/dataiku/dataiku-dss-2.1.0/python/dataiku/core/dataset.pyc in write_from_dataframe(self, df, infer_schema, write_direct)
    651             raise TypeError("write_from_dataframe is a expecting a "
    652                             "DataFrame object. You provided a " +
--> 653                             df.__class__.__name__, e)
    654
    655     def iter_rows(self,

TypeError: ('write_from_dataframe is a expecting a DataFrame object. You provided a DataFrame', AttributeError("'NoneType' object has no attribute 'encode'",))

I then tried to load the data using the loader provided by dss gui  with an  http connection (https://www.quandl.com/api/v3/datasets/GOOG/NASDAQ_GOOGL.csv?start_date=2004-08-18&end_date=2015-10-15 ) .
This created a dataset, when loading in python, I could see the index was ints, not on dates.

Maybe the fact the index is dates in the dataframe I have is the issue?

1 Answer

0 votes
Hi,

This works for me:

from dataiku import Dataset
import Quandl as qd
g = qd.get("GOOG/NASDAQ_GOOGL")
g["date"] = g.index
Dataset("qdl").write_with_schema(g)

 

This is an easy way to initialize the schema (instead of defining it by hand). Once the schema is initialized, I can replace write_with_schema by write_from_dataframe.
by
1,080 questions
1,123 answers
1,250 comments
10,870 users

©Dataiku 2012-2018 - Privacy Policy