I have pre-trained models that I would like to use in dataiku recipe.

Solved!
pbena64
Level 2
I have pre-trained models that I would like to use in dataiku recipe.

Hi, I have pre-trained model on my local machine that I would like to use in a recipe. One model is trained using the alibi-detect library and the other one is the popular SAM model. Appreciate any tips on how to use these models in a dataiku recipe.

0 Kudos
1 Solution
JordanB
Dataiker

Hi @pbena64,

You mentioned in your original post that the model is on your local machine. Please see the following documentation to see the difference between read/write APIs for local vs non-local files: https://doc.dataiku.com/dss/latest/connecting/managed_folders.html?_gl=1*mji5ks*_ga*MTU1ODM1OTc5OS4x...

You'll need to use the get_download_stream() API: https://developer.dataiku.com/latest/api-reference/python/managed-folders.html#dataiku.Folder.get_do...

import dataiku
import pickle

remote_folder = dataiku.Folder("pkl-models") # This is a managed folder on S3 connection

# Save pickle
with remote_folder.get_writer("/test-model.pkl") as writer:
    pickle.dump(clf, writer) # Assuming clf is a sklearn object

# Load pickle
with remote_folder.get_download_stream('test-model.pkl') as f:
    clf_loaded = pickle.load(f)

 

Thanks,

Jordan

 

View solution in original post

0 Kudos
4 Replies
JordanB
Dataiker

Hi @pbena64,

The DSS Developer Docs includes multiple examples of loading and re-using pretrained models: https://developer.dataiku.com/latest/tutorials/machine-learning/code-env-resources/index.html

You can load your upload your model to DSS Managed folder on the local filesystem and load it into a python notebook or recipe: https://developer.dataiku.com/latest/api-reference/python/managed-folders.html#dataiku.Folder.get_pa...

Thanks,

Jordan

 

0 Kudos
pbena64
Level 2
Author

Hi @JordanB,

Thanks for the help. My problem is similar to the second paragraph of your reply. However, the link takes me to a page with the following statement:

"This method can only be called for managed folders that are stored on the local filesystem of the DSS server. For non-filesystem managed folders (HDFS, S3, โ€ฆ), you need to use the various read/download and write/upload APIs."

which is the case for me. I would appreciate it if you could please point me to the read/write APIs mentioned in the quote.

Thanks!

0 Kudos
JordanB
Dataiker

Hi @pbena64,

You mentioned in your original post that the model is on your local machine. Please see the following documentation to see the difference between read/write APIs for local vs non-local files: https://doc.dataiku.com/dss/latest/connecting/managed_folders.html?_gl=1*mji5ks*_ga*MTU1ODM1OTc5OS4x...

You'll need to use the get_download_stream() API: https://developer.dataiku.com/latest/api-reference/python/managed-folders.html#dataiku.Folder.get_do...

import dataiku
import pickle

remote_folder = dataiku.Folder("pkl-models") # This is a managed folder on S3 connection

# Save pickle
with remote_folder.get_writer("/test-model.pkl") as writer:
    pickle.dump(clf, writer) # Assuming clf is a sklearn object

# Load pickle
with remote_folder.get_download_stream('test-model.pkl') as f:
    clf_loaded = pickle.load(f)

 

Thanks,

Jordan

 

0 Kudos
pbena64
Level 2
Author

Hi @JordanB,

Thanks for the response. It solved my problem. However, the issue I have now is that Pickle is available only for TensorFlow (keras specifically)=>2.13.0, while the latest available I can get with the env installation is 2.12.0.

I will start a new question for this.

Regards,

0 Kudos

Labels

?
Labels (2)

Setup info

?
Tags (2)
A banner prompting to get Dataiku