Coming soon: We’re working on a brand new, revamped Community experience. Want to receive updates? Sign up now!

0 votes


I have a service that continuously emits data. You start receiving data once you have connected opening a TCP connection and never stops until you terminate the connection.

I'd like to develop a custom plugin to be able to process that data on Dataiku how I can do that as data never ends?

Will "build" log overload the server?




We are loading data from a flight's metasearch service. They expose a data stream we consume polling from a TCP connection ( We plan to use Dataiku to parse, sanitize, ... data and the drop into Hadoop apart from applying the corresponding analysis and lab ;)

@alexander Hope this helps

edited by
Hi Gustavo, this is an interesting topic. To best advise you, I would like to better understand the context. What type of data do you receive? Do you have an estimation of the volume? What technologies do you have in mind for the processing and the storage? Cheers, Alex
Any news on this Gustavo?

1 Answer

0 votes
Hi Gustavo,

For this type of use case, we would advise performing the data ingestion outside of Dataiku DSS, with a streaming engine such as Flume or Kafka.

Once the data is ingested, you can perform data transformation and machine learning modelling in DSS in a micro-batch way,  using partitions to avoid recomputing on the whole data:


1,337 questions
1,364 answers
11,916 users

©Dataiku 2012-2018 - Privacy Policy