Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
How may I select a column based off position within a recipe?
Is there a way to select a column using formula language [e.g. something like Column1 rather than val("column_name")]?
Is there a way to select a column using another type of step within a prepare recipe?
Thanks
Hi @Rickh008 ,
You can accomplish this by using a Python recipe.
The following example will add a new column to the dataset that contains the value of the 5th column (index 4):
import dataiku
COLUMN_NAME = "nth_column"
"""Name of the column that will be created"""
COLUMN_INDEX = 4
"""Index (position) of the column whose value you want to copy
The index starts from 0, so the 2nd column has an index of 1
"""
# Read recipe inputs
input_dataset = dataiku.Dataset("INPUT_DATASET")
dataframe = input_dataset.get_dataframe()
# Create a new column where the value is the value of the nth column
dataframe[COLUMN_NAME] = dataframe.iloc[:, COLUMN_INDEX]
# Write recipe outputs
output_dataset = dataiku.Dataset("OUTPUT_DATASET")
output_dataset.write_with_schema(dataframe)
You can change the position of the column that is selected by changing the COLUMN_INDEX variable.
Once the new column is created, you can then use it in any downstream recipes.
Thanks,
Zach
Hi @Rickh008 ,
You can accomplish this by using a Python recipe.
The following example will add a new column to the dataset that contains the value of the 5th column (index 4):
import dataiku
COLUMN_NAME = "nth_column"
"""Name of the column that will be created"""
COLUMN_INDEX = 4
"""Index (position) of the column whose value you want to copy
The index starts from 0, so the 2nd column has an index of 1
"""
# Read recipe inputs
input_dataset = dataiku.Dataset("INPUT_DATASET")
dataframe = input_dataset.get_dataframe()
# Create a new column where the value is the value of the nth column
dataframe[COLUMN_NAME] = dataframe.iloc[:, COLUMN_INDEX]
# Write recipe outputs
output_dataset = dataiku.Dataset("OUTPUT_DATASET")
output_dataset.write_with_schema(dataframe)
You can change the position of the column that is selected by changing the COLUMN_INDEX variable.
Once the new column is created, you can then use it in any downstream recipes.
Thanks,
Zach