0 votes
Hi,

When reading a dataset with an R recipe, I find myself struggling with types. In this specific case I would like to know how to prevent the DSS from reading my string columns as factors. In R there is an option "stringsAsFactors = F" in the function "data.frame". Is there any equivalent in the DSS?

My column is stored as a string but when using a recipe it is read as a factor. I need to compare the value with a specific string and it cannot be done with factors.

Alternatively, would you be able to suggest any function to convert the type? A naive way to do this would be to use as.character but it doesn't work (the output is a number, not my word as a string).

Many thanks,

Raphaëlle
by

2 Answers

+1 vote
Hi Raphaëlle,

What version of DSS are you using? In DSS 2.1, we switched from stringsAsFactors=T to stringsAsFactors=F, so all of your character columns should now be read as characters, not factors.

On a related note, would you like the ability to specify stringsAsFactors or is having stringsAsFactors=F sufficient?

Thanks,

Eric
by
One other thing:

Because of this change, we deprecated read.dataset(). dkuReadDataset() is now the preferred function.
Hello Eric,

I am using DSS 2.0.4a ! Glad to see that this option is implemented handily in 2.1.

In my opinion it would be useful to be able to specify stringsAsFactors for specific cases. However, if the default is F, it should be fine in most situations.

Many thanks !
Regards,
Raphaëlle
0 votes
Hi Raphaëlle,

The trick might be to apply as.character on individual columns, not on the dataframe. Does this page answer your question? http://stackoverflow.com/questions/19204729/how-to-change-factor-labels-into-string-in-a-data-frame
by
Thank you for you reply but I am afraid that it doesn't. My question wasn't clear, allow me to rephrase it.

I am actually trying to applying as.character() on an individual column.
Here is an example (I cannot share the actual data) of the vector Z:
Aaaa
Bbbb
Cccc
So indeed I can use as.character(Z) but it doesn't always work for some reason. (number output instead of "Aaaa", "Bbbb", "Cccc").

The point of my question was to point out that it'd be probably more efficient to use an option similar to stringsAsFactors = F when reading the whole dataset.
This page points out the difference: http://stackoverflow.com/questions/2851015/convert-data-frame-columns-from-factors-to-characters
1,256 questions
1,285 answers
1,459 comments
11,808 users

©Dataiku 2012-2018 - Privacy Policy