Survey banner
Switching to Dataiku - a new area to help users who are transitioning from other tools and diving into Dataiku! CHECK IT OUT

Taking Snowflake dataset to pipe delimited txt file on SFTP

Solved!
ccecil
Level 3
Taking Snowflake dataset to pipe delimited txt file on SFTP

I'm looking to take a Snowflake dataset (that I'm generating/refreshing using a scenario step) and put it into a pipe delimited txt file that I'll drop onto SFTP. Having some trouble sorting out the steps I need to take to get there. From what I've gathered, I need to do the following

1) Drop Snowflake dataset (generated by scenario) in my flow - Completed

2) Use 'Export to Folder' visual recipe to drop dataset there - Not Completed

3) Use Python visual recipe to use the data in the managed folder to output this to SFTP - Not Completed

 

Do these steps seem correct? 

I've been leveraging a few prior solutions (https://community.dataiku.com/t5/Using-Dataiku/Create-a-file-in-sftp/m-p/8967) but still a bit stuck. 


Operating system used: Windows

0 Kudos
1 Solution
AlexT
Dataiker

Hi @ccecil ,

"Export to folder" recipe does not support using a pipe-delimited output file to the managed folder.
Supported format are :
Screenshot 2023-06-04 at 10.28.24 PM.png

So indeed you would need to use a Python recipe as you suggested. To write directly to an SFTP folder you can create a SFTP managed folder, first add the SFTP connection and then choose the output of your Python recipe to the respective managed folder/path. You may also opt to use directly a python library e.g pysftp/paramiko depending on your exact needs and sftp server.

 The Python recipe would need to simply read and write out the using pandas and the delimited ""|".

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
from datetime import datetime

#replace with your managed folder
managed_folder_id = "URKU7Oqb"
input_dataset_name = "customers_labeled_prepared"

# Read dataset 
my_dataset = dataiku.Dataset(input_dataset_name)
#if the dataset can't fit into memory you may opt to use chuncked reading/writting https://developer.dataiku.com/latest/concepts-and-examples/datasets/datasets-data.html#chunked-reading-and-writing-with-pandas

df = my_dataset.get_dataframe()

# Write recipe outputs
output_folder = dataiku.Folder(managed_folder_id)
output_folder.upload_stream("filename.txt , df.to_csv(index=False, sep='|').encode("utf-8"))

 

View solution in original post

2 Replies
AlexT
Dataiker

Hi @ccecil ,

"Export to folder" recipe does not support using a pipe-delimited output file to the managed folder.
Supported format are :
Screenshot 2023-06-04 at 10.28.24 PM.png

So indeed you would need to use a Python recipe as you suggested. To write directly to an SFTP folder you can create a SFTP managed folder, first add the SFTP connection and then choose the output of your Python recipe to the respective managed folder/path. You may also opt to use directly a python library e.g pysftp/paramiko depending on your exact needs and sftp server.

 The Python recipe would need to simply read and write out the using pandas and the delimited ""|".

import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
from datetime import datetime

#replace with your managed folder
managed_folder_id = "URKU7Oqb"
input_dataset_name = "customers_labeled_prepared"

# Read dataset 
my_dataset = dataiku.Dataset(input_dataset_name)
#if the dataset can't fit into memory you may opt to use chuncked reading/writting https://developer.dataiku.com/latest/concepts-and-examples/datasets/datasets-data.html#chunked-reading-and-writing-with-pandas

df = my_dataset.get_dataframe()

# Write recipe outputs
output_folder = dataiku.Folder(managed_folder_id)
output_folder.upload_stream("filename.txt , df.to_csv(index=False, sep='|').encode("utf-8"))

 

ccecil
Level 3
Author

Thanks, @AlexT . I leveraged code that almost mirrors what you provided to get this done. I appreciate it!

0 Kudos

Labels

?

Setup info

?
Tags (1)
A banner prompting to get Dataiku