+1 vote

I have a model opened within an Analysis and exported it to a jupyter notebook. 

 

The model has one text feature that uses TF/IDF vectorization: 

The model in the notebook is using TruncatedSVD/HashingVectorizer. This is the 'default' option in the model design page, i.e. the option gets selected when a text feature is added to a model: 

But I changed that default option to TF/IDF vectorization as evident from the second image and trained the model. 

I can modify the notebook and use tf-idf as designed. 

But the question is whether it is possible to export a model the way it is designed?

by
reopened by

1 Answer

+1 vote
Best answer

Hi,

notebook generation to export a model only exports a "similar" model (documentation here: https://doc.dataiku.com/dss/latest/machine-learning/models-export.html#export-to-jupyter-notebook ). It is not possible to export the exact same model as the actual code might be much more complex than something that can fit in a human-editable notebook. The idea is to provide a good enough starting point that data scientists can actually build on.

Regards,

Joachim Zentici

Dataiku

by
selected by
Thank you for the quick response. In that case, maybe it would make sense to rename this option from "Export Model" to "Create a similar model"?
The language "Export" is misleading. If I export Airbus A380 , I am expected to deliver Airbus A380 , not Airbus A340, A350, etc.
Also, I don't think I'd agree with the statement " It is not possible to export the exact same model as the actual code might be much more complex than something that can fit in a human-editable notebook." I believe, the opposite is true, one can do much more and has more flexibility using a notebook than working with a predefined set of options of Visual Recipes. After all, all the options of Visual recipes were originally created by a human in a human-editable notebook (or some equivalent thereof), weren't they? :)
Most of the resulting code gets generated, so even if the source was indeed written by an human, the output is clearly not human readable unfortunately.
1,296 questions
1,325 answers
1,505 comments
11,862 users

┬ęDataiku 2012-2018 - Privacy Policy