Append instead of Overwrite dataset - API

GS · ‎08-26-2019

Hello,

we have created a webapp that is connected to an API model doing speech classification.

The user access the webapp, record his speech, and the model gives as an answer the 3 classes with the best fit.
The user can select the one that fits better what he said, and we want to store this result in dataiku for later analysis.
Several user will be using it in parallel so we have many results to store, and we need to collect many of them

Now it seems the API we are using to write those results only overwrite and do not append the result.
Could you please advice on how to do?

Thanks

AdrienL · ‎08-26-2019

Hi,

You can try using partitioning on that dataset, to write to a new partition at every new session.

GS · ‎08-26-2019

Thanks Adrien,
Yes but that would result in having one record per partition , which doesn't sound very efficient.

Any other option?
The perfect solution would be if we could append a new line every time there is a new session. Is that possible?

Thanks

AdrienL · ‎08-26-2019

That is not possible through the API AFAIK, you'd have to read/append/write, but then you don't have any guarantee of concurrency.

If you're prepared to complicate a bit your flow to get everything in one partition / unpartitioned dataset, you could add in a partitioned dataset, then have a scheduled scenario that runs every day and syncs this partitioned dataset into an unpartitioned one. Either 1. re-syncs all partitions in normal (overwrite mode) or 2. syncs all partition in append mode and then removes those input partitions.

AdrienL · ‎08-26-2019

To add to that, if you partition a SQL dataset, it's implemented as an additional column. So you actually have a partitioned dataset but an unpartitioned, consolidated SQL table. You can even define another unpartitioned dataset on it but a/ should set it as read-only to prevent and b/ lose the logical connection in the flow of how it's built.

GS · ‎08-26-2019

Thanks I will try the partitioning with Append Sync

Append instead of Overwrite dataset - API

Append instead of Overwrite dataset - API

Labels

code

Python

Webapps

Sign up to take part

Append instead of Overwrite dataset - API

Append instead of Overwrite dataset - API

Labels

code

Python

Webapps