Identify lines based on partition variable

JBR · ‎08-04-2017

Hi,

I'm creating datasets based on files in a S3 bucket.

The files in the bucket are in a single folder, but have several name patterns, such as "blue_01012017.csv", "red_02012017.csv", etc.

Using partitioning, I have defined "blue", "red", etc. as a partition variable called "source". This information is not included in the data itself.

What I want to do is either :

- directly split my dataset based on that "source" value

- or include a "source" column in my dataset that would have the appropriate value for each line, based on the file it came from, so I can split it later based on that value.

I can't seem to find a way to do this, can you help?

Thanks a lot in advance,

Julien

Clément_Stenac · ‎08-04-2017

It is indeed not currently possible to retrieve the source partition as a value inside the data.

You can however achieve the split with multiple sync recipes that only select a single input partition using partition dependencies:

JBR · ‎08-08-2017

Thanks Clément, it does the trick perfectly !

Identify lines based on partition variable

Identify lines based on partition variable

Labels

Datasets

Partitioning

Sign up to take part

Identify lines based on partition variable

Identify lines based on partition variable

Labels

Datasets

Partitioning