Synchronization between Google Cloud Storage (Bigquery CSV compatible) and Bigquery failed

otassel
Level 2
Synchronization between Google Cloud Storage (Bigquery CSV compatible) and Bigquery failed

Hello,



Synchronization between Google Cloud Storage (Bigquery CSV compatible) and Bigquery failed for this reason : "Input CSV files are not splittable and at least one of the files is larger than the maximum allowed size. Size is: 12178247589. Max allowed size is: 4294967296. "





Here is files on Google Cloud Storage :





How to solve this issue? How to generate files on Google Cloud Storage with no more 4GB per file (so increase the number of generated files) in order to respect Google Cloud Storage CSV file size upload limit? 

0 Kudos
2 Replies
Clรฉment_Stenac
Hi,

At the moment, you'll need to cheat a bit to force DSS to generate more files:

* Use a "Filter/Sampling" recipe, and set first records sampling with 999999999999 records
* Go in the settings of the output dataset > advanced, and set "write bucketing" to a high enough value (probably at least 30 in your case)
0 Kudos
otassel
Level 2
Author
Thanks Clรฉment. I found another solution who saved me one flow step in comparison with your answer : I increased the number of Max Threads to 30 in the recipe "Advanced" tab. Thereby the recipe generated one file per thread.

Thanks for your help
0 Kudos

Labels

?
Labels (1)
A banner prompting to get Dataiku