Python recipe picks the dataset name automatically for SQL Query

ankitmahato · ‎07-13-2017

In the given example https://www.dataiku.com/learn/guide/code/python/use-python-sql.html it is assumed that the user knows the dataset (sfo_prepared) he is operating on.

How can we make the plugin more generic and obtain the name of the dataset from the flow? The recipe should automatically pick up the dataset name when it is connected to a dataset.


# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Import the class that allows us to execute SQL on the Studio connections
from dataiku.core.sql import SQLExecutor2

# Get a handle on the input dataset
sfo_prepared = dataiku.Dataset("sfo_prepared")

# We create an executor. We pass to it the dataset instance. This way, the 
# executor  knows which SQL database should be targeted
executor = SQLExecutor2(dataset=sfo_prepared)

# Get the 5 most frequent manufacturers by total landing count 
# (over the whole period)
mf_manufacturers = executor.query_to_df(
    """
    select      "Aircraft Manufacturer" as manufacturer,
                sum("Landing Count") as count
            from sfo_prepared
            group by "Aircraft Manufacturer"
            order by count desc limit 5
    """)

Mattsco · ‎07-13-2017

Hello,

yes, you can find the solution in the plugin tutorial.

you have an exemple here:


# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku.customrecipe import *

# Read the input

# There is only one input, and it's mandatory so we can access [0]
main_input_name = get_input_names_for_role('main')[0]
input_dataset =  dataiku.Dataset(main_input_name)

df = input_dataset.get_dataframe()

Mattsco

View solution in original post

Mattsco · ‎07-13-2017

Hello,

yes, you can find the solution in the plugin tutorial.

you have an exemple here:


# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku.customrecipe import *

# Read the input

# There is only one input, and it's mandatory so we can access [0]
main_input_name = get_input_names_for_role('main')[0]
input_dataset =  dataiku.Dataset(main_input_name)

df = input_dataset.get_dataframe()

Mattsco

ankitmahato · ‎07-13-2017

Got it mate! Thanks.

Sign up to take part

Python recipe picks the dataset name automatically for SQL Query

Python recipe picks the dataset name automatically for SQL Query

Labels

code

Python