+1 vote
How to convert a CSV file to JSON file using R or Python in Dataiku
asked by Doris

1 Answer

0 votes
The CSV dataset in Dataiku is exposed to Python as a Pandas dataframe; I would try using the to_json() method from Pandas to convert it to JSON.  https://pandas.pydata.org/pandas-docs/stable/generated/pandas.DataFrame.to_json.html
answered by
Hi Alex,

I tried it. However, it does not work in python recipe because that you have to write a dataset in the end.

Do you have examples with more details about how that works?

I think I've misunderstood what you're trying to do.  What's the end goal for the JSON?  Do you want it to be passed as a cell of a downflow Dataiku dataset or written to an external file, or..?
Hi Alex,

Ideally, I want them both. First, I want to write it to a cell in dataiku flow. Somehow in the future, I might need to be able to write it to an external file.

Cool;  so, the following code could be used in a Python recipe to read a Dataiku dataset, convert it to json, write it back to a Dataiku dataset, and write it out to a file.  "input_dataset" can be changed to whatever the name of the input Dataiku dataset is for the recipe, "output_dataset" can be changed to whatever the name of the output dataset is, and "output_file" can be changed to the path where you want the json to be written on the filesystem.

# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu

# Recipe inputs
input = dataiku.Dataset("input_dataset")
input_df = input.get_dataframe()

# Convert to json
input_json = input_df.to_json()

# Convert json to a one row, one column data frame
input_json_df = pd.DataFrame(data=[input_json], columns=['json'])

# Write new data frame back to Dataiku dataset
output = dataiku.Dataset("output_dataset")

# Write json to external file
f = open('output_file', 'w')
Thanks!! It works:)
563 questions
575 answers
320 users