Problems with bundles and plugins in Dataiku migration

Povilas
Level 2
Problems with bundles and plugins in Dataiku migration

Hi.



We are migrating to Dataiku 5.0.2. To do that we are exporting all the projects from our current instance and then importing them to new the instance. There are couple of problems/questions we have:




  • Some our projects have bundles. However after export-import I cannot see the bundles anymore. Maybe you can suggest a way to export-import bundles as well?

  • We have some custom plugins created. Is there any way to export-import them? Or should we recreate each plugin?

  • Also we have some R/Python code environments. Is there a way to export-import them?



Thanks!

0 Kudos
22 Replies
Alex_Combessie
Dataiker Alumni
Hi Povilas, In order to migrate an existing Dataiku installation from one instance to another, we advise to migrate the 'data directory'. This way there is no need for manual ad-hoc migration actions for each project, code environment, configurations, etc. This is documented in more details on https://doc.dataiku.com/dss/latest/installation/migrations.html#migrating-the-data-directory. Hope it helps, Alexandre
0 Kudos
Povilas
Level 2
Author
Hi Alex. Thank you! Just to make sure, will this solve our bundles problem too?
0 Kudos
Alex_Combessie
Dataiker Alumni
Hi, this procedure is strictly for migrating projects in a DSS Design node, not Automation node. Is that what you want to achieve? Or do you want to do that from a Design node to an Automation node?
0 Kudos
Povilas
Level 2
Author
No, we are migrating from design node to other design node. Just this new node will be on Dataiku 5.0
0 Kudos
Alex_Combessie
Dataiker Alumni
Migrating the data directory can only work with the same Dataiku version. We recommend first to migrate the data directory to another instance with the same version. Then to perform the DSS version upgrade. Hope that clarifies the matter.
0 Kudos
Povilas
Level 2
Author
Well, we already have upgraded instance.

However, let's back to bundles. If copy of "data directory" would solve bundles issue (let's say from one instance to the other, both with same DSS version) then those bundles should be stored somewhere in "data directory". Am I correct? If yes, where exactly it is placed?
0 Kudos
Alex_Combessie
Dataiker Alumni
Hi Povilas, bundles is a term only referred to for automation nodes. Are you still referring to a situation with design nodes only? So to clarify you have already upgraded the origin design node to the same version as the destination design node?
0 Kudos
Povilas
Level 2
Author
Hi. Yes, I am talking only about design nodes. I don't know how bundles are related with automation node but we used bundles for archiving our projects in design node. For example, we have a project example_project and we want to signicantly upgrade/renew this project. However we don't want to lose our old project or to create "duplicates" like project example_project_v2. So we were creating new bundle for a project everytime we want to make this big upgrade/renew. This was like the way to archive projects. And we had a possibility to restore a project any time to any older version (or bundle).

So basically now we have that history in our design node and we want to be able to tranform that to other instance. Is that possible? I believe it should be, those bundles should be stored somewhere.
0 Kudos
Povilas
Level 2
Author
Maybe I already found it. There is a directory '/data/DATA_DIR/bundle_activation_backups/'. I see the bundles here. Maybe it is enough to simply copy this one to new instance?
0 Kudos
Povilas
Level 2
Author
A bit off topic question but still... Does DSS 5.0 contains any new features regarding source control, git and versioning? I cannot find something among major updates.
0 Kudos
Alex_Combessie
Dataiker Alumni
To clarify the terminology: bundles relate to moving a project from design to automation. For saving a project in a design node we use the term project export, not bundles. Both are very similar artifacts, but with different usage and goal. In both cases, you can only move the artifact between nodes of the same version. In your case, if you want to archive versions of projects in the design node, we recommend to perform regular project exports. If you want to migrate instance, then migrating the data directory is the recommended way.
0 Kudos
Alex_Combessie
Dataiker Alumni
The path you mention (bundle_activation_backups) has no link with the project export or the design node migration.
0 Kudos
Alex_Combessie
Dataiker Alumni
Finally related to git integration, we plan major new features in DSS5.1 (upcoming). In general, we advise checking our release notes for more details: https://doc.dataiku.com/dss/latest/release_notes/index.html
0 Kudos
Povilas
Level 2
Author
Project export is not an option for archiving. Because of couple reasons. First of all, backup should be stored somewhere outside DSS which is not convenient. Export file could be lost, you need place to store it, everyone should have access to it and so on. Secondly, you cannot import bundle without creating a new project (correct me if am I wrong). So you have duplicate projects. Off course, you can delete old project and then import different one. But still that is not convenient.

Bundles were an option for this because you create a bundle and just switch between bundles when you need to change version. Not so nice as a simple git branch stuff but still it was relatively ok. But as we see, until it is ok only until the time when you need to migrate...

I understand that "bundle_activation_backups has no link with the project export or the design node migration" as you said.. But are you sure it does not contain all info about the bundles? I see all the recipes, datasets and other stuff inside those folders... Off course, I don't know what would happen if I just copy that to new instance. Unfortunately I don't have ssh connection to our servers to try it now.
0 Kudos
Alex_Combessie
Dataiker Alumni
To answer your points: 1. Project exports can be stored anywhere using the Dataiku python api or the cli. 2a. Could you detail your current process of using bundles, step-by-step? Normally bundles can only deal with automation nodes. 2b. What would be the ideal process for you to manage versions of project in the design node?
0 Kudos
Povilas
Level 2
Author
1. That does not solves export-import inconvenience. You have duplicate projects or you should delete projects. I mentioned that in my previous comment.

2a. Yes, I can. Go to the project. At the top (in black) you see project name, button to go to flow, notebooks and others. Further on you can see Bundles. Here you create new bundle that saves current version of your project. After that you can create more bundles and restore (revert) to previous. I don't want to go into every detail but I think it quite straightforward. But everything is happening in this Bundles section.

2b. Ideal process is not something special. Typical usage of source control systems. Creating branches for each new version and reverting to specific one when requires.
0 Kudos
Alex_Combessie
Dataiker Alumni
Thanks, that is interesting. I will have a chat with our product team and gather some thoughts on the right way to manage versioning of projects in the design node. Having said that, for your question of how to migrate design node instance, that does not change our recommendation to migrate the whole data directory.
0 Kudos
Povilas
Level 2
Author
I understand that. However I am afraid we don't have this possibility anymore.

btw, I have one more question about migration. I saw and tried that there is a possibility to export and import customer code envs. Is there a possibility to export default R code env? Or maybe make a copy of it and export that copy?
0 Kudos
Povilas
Level 2
Author
One more question: how to specify R package version in R code env. I tried to specify according to example: "xgboost","0.6-4". However, in Actually installed packages I see "xgboost","0.71.2". Do you have any idea why this could happen?
0 Kudos
Alex_Combessie
Dataiker Alumni
Hi, it is not possible to get an older package version using the R code environment UI. Unfortunately, that's the way that install.package works. You could use alternatively install_version from the devtools packages as explained in https://support.rstudio.com/hc/en-us/articles/219949047-Installing-older-versions-of-packages. To do so, you would need to manually execute the code in an R notebook or recipe using the specific code env.
0 Kudos