Export Redshift Data in CSV with Header on SFTP server

raviagrawal
Level 3
Export Redshift Data in CSV with Header on SFTP server
Hi,



I have requirement where I have to export the Dataiku process output from redshift to CSV file with header on a SFTP site for a complete automation of the process. Could you please provide a possible way how to achieve this ?



Regards,

Ravi Agrawal
0 Kudos
5 Replies
Mattsco
Dataiker

Hi Ravi, 



You can use the "Export to folder" recipe and specify an already setup connection to your SFTP.

Matt

Mattsco
raviagrawal
Level 3
Author
Hi,

Is it possible to append the datetimestamp at the end of the file name using this recipe ?

Also I used the export folder functionality and it failed with following log

[11:03:33] [INFO] [dku] running compute_6xb7d0uO_NP - ----------------------------------------
[11:03:33] [INFO] [dku] running compute_6xb7d0uO_NP - DSS startup: jek version:5.1.5
[11:03:33] [INFO] [dku] running compute_6xb7d0uO_NP - DSS home: /home/dataiku/DATA_DIR
[11:03:33] [INFO] [dku] running compute_6xb7d0uO_NP - OS: Linux 4.14.72-68.55.amzn1.x86_64 amd64 - Java: Oracle Corporation 1.8.0_191
[11:03:33] [INFO] [dku.flow.jobrunner] running compute_6xb7d0uO_NP - Allocated a slot for this activity!
[11:03:33] [INFO] [dku.flow.jobrunner] running compute_6xb7d0uO_NP - Run activity
[11:03:33] [INFO] [dku.flow.activity] running compute_6xb7d0uO_NP - Executing default pre-activity lifecycle hook
[11:03:33] [INFO] [dip.provider.connection.prepare] running compute_6xb7d0uO_NP - Setup connection with encrypted fields CDP_mbox
[11:03:33] [INFO] [dku.managedfolders.handler] running compute_6xb7d0uO_NP - Create provider for IQVIATRIGGER.RatkEqgX with path IQVIATRIGGER/RatkEqgX
[11:03:33] [INFO] [dku.fsproviders.ssh] running compute_6xb7d0uO_NP - Effective root : 'IQVIATRIGGER/RatkEqgX' from 'null' / 'IQVIATRIGGER/RatkEqgX' useRelativePaths=true
[11:03:33] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - Connecting to mboxnaprd.jnj.com port 22
[11:03:33] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - Connection established
[11:03:33] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - Remote version string: SSH-2.0-CrushFTPSSHD
[11:03:33] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - Local version string: SSH-2.0-JSCH-0.1.54
[11:03:33] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - CheckCiphers: aes256-ctr,aes192-ctr,aes128-ctr,aes256-cbc,aes192-cbc,aes128-cbc,3des-ctr,arcfour,arcfour128,arcfour256
[11:03:33] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - CheckKexes: diffie-hellman-group14-sha1,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - CheckSignatures: ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - SSH_MSG_KEXINIT sent
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - SSH_MSG_KEXINIT received
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: server: curve25519-sha256@libssh.org,diffie-hellman-group-exchange-sha256,diffie-hellman-group18-sha512,diffie-hellman-group17-sha512,diffie-hellman-group16-sha512,diffie-hellman-group15-sha512,diffie-hellman-group14-sha256,diffie-hellman-group14-sha1,ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group-exchange-sha1
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: server: ssh-dss,ssh-rsa
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: server: aes256-ctr,aes128-ctr,aes256-cbc,3des-ctr,3des-cbc,aes192-ctr,aes192-cbc,aes128-cbc,blowfish-cbc,arcfour128,arcfour
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: server: aes256-ctr,aes128-ctr,aes256-cbc,3des-ctr,3des-cbc,aes192-ctr,aes192-cbc,aes128-cbc,blowfish-cbc,arcfour128,arcfour
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: server: hmac-md5,hmac-sha256,hmac-sha2-256,hmac-sha2-256-96,hmac-sha512,hmac-sha2-512,hmac-sha2-512-96,hmac-sha1,hmac-sha1-96,hmac-ripemd160,hmac-md5-96
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: server: hmac-md5,hmac-sha256,hmac-sha2-256,hmac-sha2-256-96,hmac-sha512,hmac-sha2-512,hmac-sha2-512-96,hmac-sha1,hmac-sha1-96,hmac-ripemd160,hmac-md5-96
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: server: none,zlib
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: server: none,zlib
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: server:
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: server:
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: client: ecdh-sha2-nistp256,ecdh-sha2-nistp384,ecdh-sha2-nistp521,diffie-hellman-group14-sha1,diffie-hellman-group-exchange-sha256,diffie-hellman-group-exchange-sha1,diffie-hellman-group1-sha1
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: client: ssh-rsa,ssh-dss,ecdsa-sha2-nistp256,ecdsa-sha2-nistp384,ecdsa-sha2-nistp521
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: client: aes128-ctr,aes128-cbc,3des-ctr,3des-cbc,blowfish-cbc,aes192-ctr,aes192-cbc,aes256-ctr,aes256-cbc
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: client: aes128-ctr,aes128-cbc,3des-ctr,3des-cbc,blowfish-cbc,aes192-ctr,aes192-cbc,aes256-ctr,aes256-cbc
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: client: hmac-md5,hmac-sha1,hmac-sha2-256,hmac-sha1-96,hmac-md5-96
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: client: hmac-md5,hmac-sha1,hmac-sha2-256,hmac-sha1-96,hmac-md5-96
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: client: none
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: client: none
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: client:
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: client:
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: server->client aes128-ctr hmac-md5 none
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - kex: client->server aes128-ctr hmac-md5 none
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - SSH_MSG_KEX_ECDH_INIT sent
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - expecting SSH_MSG_KEX_ECDH_REPLY
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - ssh_rsa_verify: signature true
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - Host 'mboxnaprd.jnj.com' is known and matches the RSA host key
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - SSH_MSG_NEWKEYS sent
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - SSH_MSG_NEWKEYS received
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - SSH_MSG_SERVICE_REQUEST sent
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - SSH_MSG_SERVICE_ACCEPT received
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - Authentications that can continue: password
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - Next authentication method: password
[11:03:34] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - Authentication succeeded (password).
[11:03:34] [DEBUG] [dku.remotefiles.sftp] running compute_6xb7d0uO_NP - Stats for remote file 'IQVIATRIGGER/RatkEqgX'
[11:03:34] [ERROR] [dku.fsproviders.ssh] running compute_6xb7d0uO_NP - Dataset root path does not exist : IQVIATRIGGER/RatkEqgX
java.io.FileNotFoundException
at com.dataiku.dip.datasets.fs.SSHFSProvider.stat(SSHFSProvider.java:320)
at com.dataiku.dip.datasets.fs.SSHFSProvider.ensureDirectory(SSHFSProvider.java:191)
at com.dataiku.dip.managedfolder.ManagedFolderHandler.ensurePartitionFolder(ManagedFolderHandler.java:587)
at com.dataiku.dip.dataflow.DefaultActivityLifecycleHook.handleTargetFolderPre(DefaultActivityLifecycleHook.java:77)
at com.dataiku.dip.dataflow.DefaultActivityLifecycleHook.onPreActivity(DefaultActivityLifecycleHook.java:138)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner.runActivity(ActivityRunner.java:558)
at com.dataiku.dip.dataflow.jobrunner.JobRunner.runActivity(JobRunner.java:125)
at com.dataiku.dip.dataflow.jobrunner.JobRunner.access$900(JobRunner.java:36)
at com.dataiku.dip.dataflow.jobrunner.JobRunner$ActivityExecutorThread.run(JobRunner.java:312)
[11:03:34] [DEBUG] [dku.remotefiles.sftp] running compute_6xb7d0uO_NP - Stats for remote file IQVIATRIGGER/RatkEqgX
[11:03:35] [DEBUG] [dku.remotefiles.sftp] running compute_6xb7d0uO_NP - Stats for remote file 'IQVIATRIGGER'
[11:03:45] [INFO] [dku.remotefiles.ssh.jsch] running compute_6xb7d0uO_NP - Disconnecting from mboxnaprd.jnj.com port 22
[11:03:45] [ERROR] [dku.flow.jobrunner] running compute_6xb7d0uO_NP - Activity unexpectedly failed
java.io.IOException: SFTP error
at com.dataiku.dip.input.remote.SFTPRemote.mkdirs(SFTPRemote.java:197)
at com.dataiku.dip.datasets.fs.SSHFSProvider.ensureDirectory(SSHFSProvider.java:197)
at com.dataiku.dip.managedfolder.ManagedFolderHandler.ensurePartitionFolder(ManagedFolderHandler.java:587)
at com.dataiku.dip.dataflow.DefaultActivityLifecycleHook.handleTargetFolderPre(DefaultActivityLifecycleHook.java:77)
at com.dataiku.dip.dataflow.DefaultActivityLifecycleHook.onPreActivity(DefaultActivityLifecycleHook.java:138)
at com.dataiku.dip.dataflow.jobrunner.ActivityRunner.runActivity(ActivityRunner.java:558)
at com.dataiku.dip.dataflow.jobrunner.JobRunner.runActivity(JobRunner.java:125)
at com.dataiku.dip.dataflow.jobrunner.JobRunner.access$900(JobRunner.java:36)
at com.dataiku.dip.dataflow.jobrunner.JobRunner$ActivityExecutorThread.run(JobRunner.java:312)
Caused by: 4: %MKD-bad%
at com.jcraft.jsch.ChannelSftp.throwStatusError(ChannelSftp.java:2873)
at com.jcraft.jsch.ChannelSftp.mkdir(ChannelSftp.java:2182)
at com.dataiku.dip.input.remote.SFTPRemote.mkdirs(SFTPRemote.java:191)
... 8 more
[11:03:45] [INFO] [dku.remotefiles.ssh.jsch] - Caught an exception, leaving main loop due to Socket closed

Regards,
Ravi Agrawal
0 Kudos
Mattsco
Dataiker
Is it possible to append the datetimestamp?
No, we don't have the feature implemented.

For the error, what version of DSS do you use?

A first guess is that the file is too big for the throughput of the connection/SFTP server that closes the connection after a while if the upload takes too much time.

You can try with smaller files first and(or) upload directly from an SFTP client outside of DSS to check the stability of the upstream connection?
Mattsco
0 Kudos
raviagrawal
Level 3
Author
Hi,

We are using Dataiku 5.1.5 version and file is only 900 records.

My guess is that issue is because of following command but I am not sure how to correct this one.

[11:03:33] [INFO] [dku.fsproviders.ssh] running compute_6xb7d0uO_NP - Effective root : 'IQVIATRIGGER/RatkEqgX' from 'null' / 'IQVIATRIGGER/RatkEqgX' useRelativePaths=true

[11:03:34] [DEBUG] [dku.remotefiles.sftp] running compute_6xb7d0uO_NP - Stats for remote file 'IQVIATRIGGER/RatkEqgX'
[11:03:34] [ERROR] [dku.fsproviders.ssh] running compute_6xb7d0uO_NP - Dataset root path does not exist : IQVIATRIGGER/RatkEqgX
java.io.FileNotFoundException

Regards,
Ravi Agrawal
0 Kudos
raviagrawal
Level 3
Author
Hi,

I am able to put the file on FTP server but I have two issues
a. File Name is same as table name and I want to change it to the new name.
b. Adding the Datatimestamp at the end of the file.

I referred to following link to replicate the process, but somehow file is not being created

https://answers.dataiku.com/1038/export-to-folder-different-name-each-day

Could you please check and suggest something ?

Regards,
Ravi Agrawal
0 Kudos

Labels

?
Labels (3)
A banner prompting to get Dataiku