How do I backup my instance of Data Science Studio?

Solved!
UserBird
Dataiker
How do I backup my instance of Data Science Studio?
 
1 Solution
jrouquie
Dataiker Alumni

Backup



First, locate your data directory (named as "DATA_DIR" in our documentation). This directory contains your configuration, projects (graphs, recipes, notebooks, etc.), connections to databases, the filesystem_managed files, etc. Note that this directory may NOT contain all your data (for instance, data hosted on a database or on a Hadoop cluster).



Make sure you don't have any job running and stop your instance:




DATA_DIR/bin/dss stop


Compress DATA_DIR and save the archive somewhere else:




tar -zcvf your_backup.tar.gz /path/to/DATA_DIR/


Finally, restart the studio:




DATA_DIR/bin/dss start


 



Running an automatic backup



Here is an example of a bash script you can run periodically with cron:




#!/bin/bash
#Purpose: Backup of DSS data directory

/path/to/DATA_DIR/bin/dss stop

export GZIP=-9
TIME=`date +"%Y-%m-%d"`
FILENAME="backup-dss-data-$TIME.tar.gz"
SRCDIR="/path/to/DATA_DIR"
DSTDIR="/home/backups"

tar -cpzf $DSTDIR/$FILENAME $SRCDIR

/path/to/DATA_DIR/bin/dss start


You could save this script in a file backupscript.sh and set a cron task like the following one (running from Monday to Friday at 6:15am) :




15 6 * * 1-5 /path/to/backupscript.sh


Restoring a backup



To restore a backup, stop DSS, and simply replace the content of DATA_DIR with the content of the archive:




DATA_DIR/bin/dss stop
tar -zxvf your_backup.tar.gz
DATA_DIR/bin/dss start

View solution in original post

0 Kudos
2 Replies
jrouquie
Dataiker Alumni

Backup



First, locate your data directory (named as "DATA_DIR" in our documentation). This directory contains your configuration, projects (graphs, recipes, notebooks, etc.), connections to databases, the filesystem_managed files, etc. Note that this directory may NOT contain all your data (for instance, data hosted on a database or on a Hadoop cluster).



Make sure you don't have any job running and stop your instance:




DATA_DIR/bin/dss stop


Compress DATA_DIR and save the archive somewhere else:




tar -zcvf your_backup.tar.gz /path/to/DATA_DIR/


Finally, restart the studio:




DATA_DIR/bin/dss start


 



Running an automatic backup



Here is an example of a bash script you can run periodically with cron:




#!/bin/bash
#Purpose: Backup of DSS data directory

/path/to/DATA_DIR/bin/dss stop

export GZIP=-9
TIME=`date +"%Y-%m-%d"`
FILENAME="backup-dss-data-$TIME.tar.gz"
SRCDIR="/path/to/DATA_DIR"
DSTDIR="/home/backups"

tar -cpzf $DSTDIR/$FILENAME $SRCDIR

/path/to/DATA_DIR/bin/dss start


You could save this script in a file backupscript.sh and set a cron task like the following one (running from Monday to Friday at 6:15am) :




15 6 * * 1-5 /path/to/backupscript.sh


Restoring a backup



To restore a backup, stop DSS, and simply replace the content of DATA_DIR with the content of the archive:




DATA_DIR/bin/dss stop
tar -zxvf your_backup.tar.gz
DATA_DIR/bin/dss start
0 Kudos
CoreyS
Dataiker Alumni

Updated documentation for backing up your DSS Instance can be found here: https://doc.dataiku.com/dss/latest/operations/backups.html?highlight=backup

Looking for more resources to help you use Dataiku effectively and upskill your knowledge? Check out these great resources: Dataiku Academy | Documentation | Knowledge Base

A reply answered your question? Mark as โ€˜Accepted Solutionโ€™ to help others like you!
0 Kudos

Labels

?
Labels (1)
A banner prompting to get Dataiku