Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I am working on deploying an API on Dataiku with a dataset lookup endpoint. I can retrieve individual rows this way using the API, but I'd like to get multiple rows or even the whole dataset. Is this possible?
Hi,
API node is a separate instance, so to access the dataset from the Design node you would need to use Dataiku API the remote way. Please refer to an example below:
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
host="dss_host_url:port" # here replace with the DSS host_url:port
api_key="api_key" # here replace with your api key
dataiku.set_remote_dss(host, api_key)
dataiku.set_default_project_key("promect_name") # shere replace with the project you want to create a dataset in
# Example: load a DSS dataset as a Pandas dataframe
mydataset = dataiku.Dataset("mydataset_name")
mydataset_df = mydataset.get_dataframe()
However, please note generally, the API node shouldn't be used to load large datasets and/or perform large computations.
Best,
Vitaliy
Hi,
No, it is not possible to return multiple rows with a lookup endpoint. You will need to use a custom python endpoint for that.
Best,
Vitaliy
Do you know if I can reference a dataset in a python endpoint? I tried it in the standard way and it didn't work
i.e.
import dataiku
dataset = dataiku.Dataset('dataset_name')
df = dataset.get_dataframe()
Hi,
API node is a separate instance, so to access the dataset from the Design node you would need to use Dataiku API the remote way. Please refer to an example below:
import dataiku
from dataiku import pandasutils as pdu
import pandas as pd
host="dss_host_url:port" # here replace with the DSS host_url:port
api_key="api_key" # here replace with your api key
dataiku.set_remote_dss(host, api_key)
dataiku.set_default_project_key("promect_name") # shere replace with the project you want to create a dataset in
# Example: load a DSS dataset as a Pandas dataframe
mydataset = dataiku.Dataset("mydataset_name")
mydataset_df = mydataset.get_dataframe()
However, please note generally, the API node shouldn't be used to load large datasets and/or perform large computations.
Best,
Vitaliy