Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hi Team,
Actually my Workflow is very Huge and i want to extract datasets name under a Tag, so is there any way to get the list of datasets under a TAG.
Thanks in Advance
Yes, you have several options. For example, save the output as Pandas dataframe and use pandas.DataFrame.to_excel().
import dataiku
import pandas as pd
client = dataiku.api_client()
project = client.get_project(dataiku.get_custom_variables()["projectKey"])
datasets = project.list_datasets()
result_dict = {'dataset':[],'tags':[]}
for index in range(len(datasets)):
if datasets[index]['tags']:
result_dict['dataset'].append(datasets[index]['name'])
result_dict['tags'].append(datasets[index]['tags'])
df = pd.DataFrame(data=result_dict)
df.to_excel('output1.xlsx')
This will save XLSX file into DATA_DIR/jupyter-run/dku-workdirs/MY_PROJECT/recipe_name/ folder
P.s. If you are running on older version of DSS or code env used to run the notebook uses legacy pandas==0.23 you will need to install xlsxwriter into corresponding code env and perform import xlsxwriter
Hello,
You can do this from Dataset menu in GUI
as well as from a project's notebook
import dataiku
client = dataiku.api_client()
project = client.get_project(dataiku.get_custom_variables()["projectKey"])
datasets = project.list_datasets()
tag_name = 'sql_dataset'
for index in range(len(datasets)):
if datasets[index]['tags']:
if tag_name in datasets[index]['tags']:
print "dataset '{}' is tagged with '{}'".format(datasets[index]['name'],tag_name)
Hi,
Can we get the dataset name and corresponding tag in an excel sheet?
Yes, you have several options. For example, save the output as Pandas dataframe and use pandas.DataFrame.to_excel().
import dataiku
import pandas as pd
client = dataiku.api_client()
project = client.get_project(dataiku.get_custom_variables()["projectKey"])
datasets = project.list_datasets()
result_dict = {'dataset':[],'tags':[]}
for index in range(len(datasets)):
if datasets[index]['tags']:
result_dict['dataset'].append(datasets[index]['name'])
result_dict['tags'].append(datasets[index]['tags'])
df = pd.DataFrame(data=result_dict)
df.to_excel('output1.xlsx')
This will save XLSX file into DATA_DIR/jupyter-run/dku-workdirs/MY_PROJECT/recipe_name/ folder
P.s. If you are running on older version of DSS or code env used to run the notebook uses legacy pandas==0.23 you will need to install xlsxwriter into corresponding code env and perform import xlsxwriter