0 votes
I have a Dataiku Twitter Stream dataset which collects around 1GB of data per week. Now I have around 10GB of data, which has overloaded my Dataiku server. I managed to clear some space, but I will soon hit the size limit again.


I also have a large data cluster connected to Dataiku server, but I can't store the Twitter stream directly into it. What would be a scalable solution to transfer data from the Twitter Stream dataset to my cluster automatically every day?

1 Answer

0 votes
hi I suggest to look implemeting a kafka or samza solutions and then process the twitter using it, in this way you can store the msg in that layer and process withouth overloading your dataiku server....


hope it helps.

1,325 questions
1,345 answers
11,895 users

©Dataiku 2012-2018 - Privacy Policy