0 votes

Hi.

We are migrating to Dataiku 5.0.2. To do that we are exporting all the projects from our current instance and then importing them to new the instance. There are couple of problems/questions we have:

  • Some our projects have bundles. However after export-import I cannot see the bundles anymore. Maybe you can suggest a way to export-import bundles as well?
  • We have some custom plugins created. Is there any way to export-import them? Or should we recreate each plugin?
  • Also we have some R/Python code environments. Is there a way to export-import them?

Thanks!

by

1 Answer

0 votes
Hi Povilas, In order to migrate an existing Dataiku installation from one instance to another, we advise to migrate the 'data directory'. This way there is no need for manual ad-hoc migration actions for each project, code environment, configurations, etc. This is documented in more details on https://doc.dataiku.com/dss/latest/installation/migrations.html#migrating-the-data-directory. Hope it helps, Alexandre
by
Finally related to git integration, we plan major new features in DSS5.1 (upcoming). In general, we advise checking our release notes for more details: https://doc.dataiku.com/dss/latest/release_notes/index.html
Project export is not an option for archiving. Because of couple reasons. First of all, backup should be stored somewhere outside DSS which is not convenient. Export file could be lost, you need place to store it, everyone should have access to it and so on. Secondly, you cannot import bundle without creating a new project (correct me if am I wrong). So you have duplicate projects. Off course, you can delete old project and then import different one. But still that is not convenient.

Bundles were an option for this because you create a bundle and just switch between bundles when you need to change version. Not so nice as a simple git branch stuff but still it was relatively ok. But as we see, until it is ok only until the time when you need to migrate...

I understand that "bundle_activation_backups has no link with the project export or the design node migration" as you said.. But are you sure it does not contain all info about the bundles? I see all the recipes, datasets and other stuff inside those folders... Off course, I don't know what would happen if I just copy that to new instance. Unfortunately I don't have ssh connection to our servers to try it now.
To answer your points: 1. Project exports can be stored anywhere using the Dataiku python api or the cli. 2a. Could you detail your current process of using bundles, step-by-step? Normally bundles can only deal with automation nodes. 2b. What would be the ideal process for you to manage versions of project in the design node?
1. That does not solves export-import inconvenience. You have duplicate projects or you should delete projects. I mentioned that in my previous comment.

2a. Yes, I can. Go to the project. At the top (in black) you see project name, button to go to flow, notebooks and others. Further on you can see Bundles. Here you create new bundle that saves current version of your project. After that you can create more bundles and restore (revert) to previous. I don't want to go into every detail but I think it quite straightforward. But everything is happening in this Bundles section.

2b. Ideal process is not something special. Typical usage of source control systems. Creating branches for each new version and reverting to specific one when requires.
Thanks, that is interesting. I will have a chat with our product team and gather some thoughts on the right way to manage versioning of projects in the design node. Having said that, for your question of how to migrate design node instance, that does not change our recommendation to migrate the whole data directory.
I understand that. However I am afraid we don't have this possibility anymore.

btw, I have one more question about migration. I saw and tried that there is a possibility to export and import customer code envs. Is there a possibility to export default R code env? Or maybe make a copy of it and export that copy?
One more question: how to specify R package version in R code env. I tried to specify according to example: "xgboost","0.6-4". However, in Actually installed packages I see "xgboost","0.71.2". Do you have any idea why this could happen?
Hi, it is not possible to get an older package version using the R code environment UI. Unfortunately, that's the way that install.package works. You could use alternatively install_version from the devtools packages as explained in https://support.rstudio.com/hc/en-us/articles/219949047-Installing-older-versions-of-packages. To do so, you would need to manually execute the code in an R notebook or recipe using the specific code env.
Yes, you're are correct about install.packages and install_version. I used install_version to install older version.

However, talking about managing libraries in Dataiku code envs. You provide an example how to select older version. It is here: https://doc.dataiku.com/dss/latest/code-envs/operations-r.html#manage-packages

but that example does not work properly. When I say properly I mean like install_version. For this function you specify exact version (let's x.y.z) and it installs version x.y.z. In your interface there's an option to do this (it is written like that) but it is not working. At first, I thought that Dataiku installs latest version available but later I tried some more examples and found out that it is not true. I don't how this stuff selects a version... It is shame because your functionality is not working or documentation is misleading (or even incorrect).
Thanks for the feedback, I have logged this so we can improve our documentation.
1,257 questions
1,286 answers
1,460 comments
11,809 users

┬ęDataiku 2012-2018 - Privacy Policy