0 votes

Encoding of character columns disappears after using dkuWriteDataset() function in R recipe. My data frame contains non-Latin letters which seem to be correctly encoded (UTF-8) after importing data set with dkuReadDataset(). However, encoding disappears after I write resulting data frame further in project flow. Letters are displayed as question marks "?????" when written to the next data set. How can I keep encoding using dkuWriteDataset()?

Thank you in advance.
closed with the note: Solution was found
closed by
Hi, were you able to solve your issue?

We are using Version 4.1.1. It seems that R recipe works fine if the original data come from .csv format. So I can overcome this problem. However, we normally extract data using SQL recipe and store it in our internal database (instead of FileSystem). Then we use R recipe to transform data and it returns "?????" in the next data set. For example Russian letters are returned as "?????".  

Also, if I use dataiku filter/transform recipes the resulting data set seems fine and returns Russian letters as expected. So it must be something related to R recipe I think.

Data could look something like this:
Hi, Thanks for the feedback. It sounds like an R-specific encoding issue. What is the original dataset stored as? How is it produced?
Original data is stored as dataiku dataset, which was extracted using SQL recipe.
Would you be able to send us an actual sample of the data as a file? For instance, you can export the input dataset right before the R recipe. You can send it to alexandre.combessie -at- dataiku.com
I have sent you a dataset.
Hi Vaidas, I am not able to reproduce your issue based on the data you have sent me. Is this specific to your SQL server? Can you reproduce if you write this output to a local filesystem?
Hi, Have you solved your issue?
Yes, by saving the output to local system. It's probably related to our SQL server setup.

Thanks for your help.
Thanks for the feedback. Have a great day!
1,249 questions
1,277 answers
11,801 users

©Dataiku 2012-2018 - Privacy Policy