[2024/04/05-15:51:41.375] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.prediction] - [ct: 0] ****************************************** [2024/04/05-15:51:41.376] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.prediction] - [ct: 1] ** Start train session s23 [2024/04/05-15:51:41.376] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.prediction] - [ct: 1] ****************************************** [2024/04/05-15:51:41.377] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.prediction.strat.prns] T-7LD9R0gY - [ct: 2] Preparing base & partitions splits [2024/04/05-15:51:41.379] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.shaker.data] T-7LD9R0gY - [ct: 4] Need to compute sampleId before checking memory cache [2024/04/05-15:51:41.380] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dip.shaker.runner] T-7LD9R0gY - [ct: 5] Script settings sampleMax=104857600 processedMax=-1 [2024/04/05-15:51:41.380] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dip.shaker.runner] T-7LD9R0gY - [ct: 5] Processing with sampleMax=104857600 processedMax=524288000 [2024/04/05-15:51:41.380] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dip.shaker.runner] T-7LD9R0gY - [ct: 5] Computed required sample id : 91b05c4178f57e7920740624fbb4f6f8-NA-35120b286a1c368f0ab776b9e8094f9c1708632931807--d751713988987e9331980363e24189ce [2024/04/05-15:51:41.381] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.shaker.cache] T-7LD9R0gY - Shaker MemoryCache get on dataset SDL_ORIGINAL.stac_copy key=ds=708e6c8d4c3fe39ec7eea23b458bc74e--scr=3873dd015130b48f51e9def96a704d68--samp=91b05c4178f57e7920740624fbb4f6f8-NA-35120b286a1c368f0ab776b9e8094f9c1708632931807--d751713988987e9331980363e24189ce: hit [2024/04/05-15:51:41.381] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.shaker.schema] T-7LD9R0gY - [ct: 6] Column MarketName meaning=Text fail=0 [2024/04/05-15:51:41.381] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.shaker.schema] T-7LD9R0gY - [ct: 6] Column CountryName meaning=Text fail=0 [2024/04/05-15:51:41.382] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.shaker.schema] T-7LD9R0gY - [ct: 7] Column Reported_Month meaning=LongMeaning fail=0 [2024/04/05-15:51:41.382] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.shaker.schema] T-7LD9R0gY - [ct: 7] Column Fiscal_Year meaning=LongMeaning fail=0 [2024/04/05-15:51:41.382] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.shaker.schema] T-7LD9R0gY - [ct: 7] Column Contract_Term meaning=LongMeaning fail=0 [2024/04/05-15:51:41.382] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.shaker.schema] T-7LD9R0gY - [ct: 7] Column Product_Key meaning=Text fail=0 [2024/04/05-15:51:41.382] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.shaker.schema] T-7LD9R0gY - [ct: 7] Column Quantity meaning=DoubleMeaning fail=0 [2024/04/05-15:51:41.383] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.shaker.schema] T-7LD9R0gY - [ct: 8] Column ProductLine meaning=Text fail=0 [2024/04/05-15:51:41.383] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.shaker.schema] T-7LD9R0gY - [ct: 8] Column Billing_Model_Desc meaning=Text fail=0 [2024/04/05-15:51:41.383] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.shaker.schema] T-7LD9R0gY - [ct: 8] Column HWPurchaseDesc meaning=Text fail=0 [2024/04/05-15:51:41.383] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.shaker.schema] T-7LD9R0gY - [ct: 8] Column BreakfixSLADesc meaning=FreeText fail=0 [2024/04/05-15:51:41.383] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.shaker.schema] T-7LD9R0gY - [ct: 8] Column Net_Price meaning=DoubleMeaning fail=0 [2024/04/05-15:51:41.384] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.shaker.schema] T-7LD9R0gY - [ct: 9] Column Global_deal_flag meaning=Text fail=0 [2024/04/05-15:51:41.384] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.shaker.schema] T-7LD9R0gY - [ct: 9] Column Industry.Vertical.Segment.Name meaning=Text fail=0 [2024/04/05-15:51:41.384] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.shaker.schema] T-7LD9R0gY - [ct: 9] Column Deal_Type meaning=Text fail=0 [2024/04/05-15:51:41.384] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.shaker.schema] T-7LD9R0gY - [ct: 9] Column clv meaning=DoubleMeaning fail=0 [2024/04/05-15:51:41.384] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.shaker.schema] T-7LD9R0gY - [ct: 9] Column average_price_6_months_is_blank meaning=LongMeaning fail=0 [2024/04/05-15:51:41.385] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.shaker.schema] T-7LD9R0gY - [ct: 10] Column Max_ListPrice meaning=DoubleMeaning fail=0 [2024/04/05-15:51:41.385] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.shaker.schema] T-7LD9R0gY - [ct: 10] Column ComponentKey meaning=Text fail=0 [2024/04/05-15:51:41.385] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.shaker.schema] T-7LD9R0gY - [ct: 10] Column price_change_rate meaning=DoubleMeaning fail=0 [2024/04/05-15:51:41.386] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 11] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=maintenancekit),r=0.8,s=1337 i=73dcf069c572a6dc8a9ff1b5e12aec7c-5-part-maintenancekit [2024/04/05-15:51:41.407] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.datasets.azure] T-7LD9R0gY - [ct: 32] Building Azure handler config: {"container":"innovation-services","baseBlockID":0,"metastoreSynchronizationEnabled":true,"metastoreTableName":"stac_copy","connection":"ebstgebiadls","path":"/collaboration/SDL_ORIGINAL/stac_copy","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}} [2024/04/05-15:51:41.408] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.datasets.fsbased] T-7LD9R0gY - [ct: 33] Get filter splits for com.dataiku.dip.input.filter.InputFilter@301317a7 [2024/04/05-15:51:41.408] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.datasets.bloblike] T-7LD9R0gY - [ct: 33] Enumerating blob-like dataset SDL_ORIGINAL.stac_copy prefix=/maintenancekit/ [2024/04/05-15:51:41.408] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.datasets.fsbased] T-7LD9R0gY - [ct: 33] Building FS provider for dataset handler: SDL_ORIGINAL.stac_copy [2024/04/05-15:51:41.409] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 34] Effective root : '/service-analytics/collaboration/SDL_ORIGINAL/stac_copy' from 'service-analytics' / '/collaboration/SDL_ORIGINAL/stac_copy' [2024/04/05-15:51:41.409] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.datasets.fsbased] T-7LD9R0gY - [ct: 34] FS Provider built [2024/04/05-15:51:41.409] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.azure] T-7LD9R0gY - Using access token cache key: '6d1ad370-59dd-43ff-b215-09424e39101e [client secret] ' https://storage.azure.com/ (global) [2024/04/05-15:51:41.410] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.azure] T-7LD9R0gY - Cached OAuth2 access token found [2024/04/05-15:51:41.410] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 35] enumerateFS prefix=/maintenancekit/ path=/service-analytics/collaboration/SDL_ORIGINAL/stac_copy/maintenancekit [2024/04/05-15:51:41.410] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 35] Listing blobs with path: service-analytics/collaboration/SDL_ORIGINAL/stac_copy/maintenancekit [2024/04/05-15:51:41.473] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 98] Done Azure enumeration nb_files=1 total_size=8049355 [2024/04/05-15:51:41.474] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.input.push] T-7LD9R0gY - USTP: push selection.method=FULL records=1000 ratio=0.05 col=MarketName [2024/04/05-15:51:41.474] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 99] Extractor run: limit={"maxBytes":-1,"maxRecords":-1,"ordering":{"enabled":false,"rules":[]}} totalRecords=0 [2024/04/05-15:51:41.488] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/maintenancekit/out-s0.csv.gz** [2024/04/05-15:51:41.500] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/maintenancekit/out-s0.csv.gz** [2024/04/05-15:51:41.501] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 126] Start compressed [GZIP] stream: /maintenancekit/out-s0.csv.gz / totalRecsBefore=0 [2024/04/05-15:51:41.501] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/maintenancekit/out-s0.csv.gz** [2024/04/05-15:51:41.501] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/maintenancekit/out-s0.csv.gz** [2024/04/05-15:51:43.260] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format.csv] T-7LD9R0gY - [ct: 1885] CSV Emitted 100000 lines from file, 20 columns - interned: 42849 MEM: 24.6826171875% [2024/04/05-15:51:44.042] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 2667] after stream totalComp=8049355 totalUncomp=34808899 totalRec=160237 [2024/04/05-15:51:44.043] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 2668] Extractor run done, totalCompressed=8049355 totalRecords=160237 [2024/04/05-15:51:44.047] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 2672] Checking if splits are up to date. Policy: type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=maintenancekit),r=0.8,s=1337, instance id: 73dcf069c572a6dc8a9ff1b5e12aec7c-5-part-maintenancekit [2024/04/05-15:51:44.047] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 2672] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=maintenancekit),r=0.8,s=1337 i=73dcf069c572a6dc8a9ff1b5e12aec7c-5-part-maintenancekit [2024/04/05-15:51:44.057] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 2682] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=maintenancekit),r=0.8,s=1337 i=73dcf069c572a6dc8a9ff1b5e12aec7c-5-part-maintenancekit [2024/04/05-15:51:44.066] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 2691] Checking if splits are up to date. Policy: type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=maintenancekit),r=0.8,s=1337, instance id: 73dcf069c572a6dc8a9ff1b5e12aec7c-5-part-maintenancekit [2024/04/05-15:51:44.067] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 2692] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=maintenancekit),r=0.8,s=1337 i=73dcf069c572a6dc8a9ff1b5e12aec7c-5-part-maintenancekit [2024/04/05-15:51:44.076] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 2701] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=maintenancekit),r=0.8,s=1337 i=73dcf069c572a6dc8a9ff1b5e12aec7c-5-part-maintenancekit [2024/04/05-15:51:44.087] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 2712] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=printer),r=0.8,s=1337 i=21b6a9871978f1466e881059b2b6f39f-5-part-printer [2024/04/05-15:51:44.098] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.datasets.azure] T-7LD9R0gY - [ct: 2723] Building Azure handler config: {"container":"innovation-services","baseBlockID":0,"metastoreSynchronizationEnabled":true,"metastoreTableName":"stac_copy","connection":"ebstgebiadls","path":"/collaboration/SDL_ORIGINAL/stac_copy","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}} [2024/04/05-15:51:44.098] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.datasets.fsbased] T-7LD9R0gY - [ct: 2723] Get filter splits for com.dataiku.dip.input.filter.InputFilter@1bf287d1 [2024/04/05-15:51:44.098] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.datasets.bloblike] T-7LD9R0gY - [ct: 2723] Enumerating blob-like dataset SDL_ORIGINAL.stac_copy prefix=/printer/ [2024/04/05-15:51:44.099] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.datasets.fsbased] T-7LD9R0gY - [ct: 2724] Building FS provider for dataset handler: SDL_ORIGINAL.stac_copy [2024/04/05-15:51:44.099] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 2724] Effective root : '/service-analytics/collaboration/SDL_ORIGINAL/stac_copy' from 'service-analytics' / '/collaboration/SDL_ORIGINAL/stac_copy' [2024/04/05-15:51:44.099] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.datasets.fsbased] T-7LD9R0gY - [ct: 2724] FS Provider built [2024/04/05-15:51:44.100] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.azure] T-7LD9R0gY - Using access token cache key: '6d1ad370-59dd-43ff-b215-09424e39101e [client secret] ' https://storage.azure.com/ (global) [2024/04/05-15:51:44.100] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.azure] T-7LD9R0gY - Cached OAuth2 access token found [2024/04/05-15:51:44.100] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 2725] enumerateFS prefix=/printer/ path=/service-analytics/collaboration/SDL_ORIGINAL/stac_copy/printer [2024/04/05-15:51:44.100] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 2725] Listing blobs with path: service-analytics/collaboration/SDL_ORIGINAL/stac_copy/printer [2024/04/05-15:51:44.122] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 2747] Done Azure enumeration nb_files=1 total_size=2823848 [2024/04/05-15:51:44.123] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.input.push] T-7LD9R0gY - USTP: push selection.method=FULL records=1000 ratio=0.05 col=MarketName [2024/04/05-15:51:44.123] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 2748] Extractor run: limit={"maxBytes":-1,"maxRecords":-1,"ordering":{"enabled":false,"rules":[]}} totalRecords=0 [2024/04/05-15:51:44.135] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/printer/out-s0.csv.gz** [2024/04/05-15:51:44.147] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/printer/out-s0.csv.gz** [2024/04/05-15:51:44.147] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 2772] Start compressed [GZIP] stream: /printer/out-s0.csv.gz / totalRecsBefore=0 [2024/04/05-15:51:44.147] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/printer/out-s0.csv.gz** [2024/04/05-15:51:44.148] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/printer/out-s0.csv.gz** [2024/04/05-15:51:45.096] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 3721] after stream totalComp=2823848 totalUncomp=13833087 totalRec=64198 [2024/04/05-15:51:45.097] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 3722] Extractor run done, totalCompressed=2823848 totalRecords=64198 [2024/04/05-15:51:45.099] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 3724] Checking if splits are up to date. Policy: type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=printer),r=0.8,s=1337, instance id: 21b6a9871978f1466e881059b2b6f39f-5-part-printer [2024/04/05-15:51:45.099] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 3724] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=printer),r=0.8,s=1337 i=21b6a9871978f1466e881059b2b6f39f-5-part-printer [2024/04/05-15:51:45.109] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 3734] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=printer),r=0.8,s=1337 i=21b6a9871978f1466e881059b2b6f39f-5-part-printer [2024/04/05-15:51:45.119] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 3744] Checking if splits are up to date. Policy: type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=printer),r=0.8,s=1337, instance id: 21b6a9871978f1466e881059b2b6f39f-5-part-printer [2024/04/05-15:51:45.120] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 3745] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=printer),r=0.8,s=1337 i=21b6a9871978f1466e881059b2b6f39f-5-part-printer [2024/04/05-15:51:45.129] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 3754] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=printer),r=0.8,s=1337 i=21b6a9871978f1466e881059b2b6f39f-5-part-printer [2024/04/05-15:51:45.141] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 3766] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=installation),r=0.8,s=1337 i=b963b312a4d7fe28de3cd21862dacbb6-5-part-installation [2024/04/05-15:51:45.157] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.datasets.azure] T-7LD9R0gY - [ct: 3782] Building Azure handler config: {"container":"innovation-services","baseBlockID":0,"metastoreSynchronizationEnabled":true,"metastoreTableName":"stac_copy","connection":"ebstgebiadls","path":"/collaboration/SDL_ORIGINAL/stac_copy","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}} [2024/04/05-15:51:45.158] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.datasets.fsbased] T-7LD9R0gY - [ct: 3783] Get filter splits for com.dataiku.dip.input.filter.InputFilter@15f97697 [2024/04/05-15:51:45.158] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.datasets.bloblike] T-7LD9R0gY - [ct: 3783] Enumerating blob-like dataset SDL_ORIGINAL.stac_copy prefix=/installation/ [2024/04/05-15:51:45.158] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.datasets.fsbased] T-7LD9R0gY - [ct: 3783] Building FS provider for dataset handler: SDL_ORIGINAL.stac_copy [2024/04/05-15:51:45.159] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 3784] Effective root : '/service-analytics/collaboration/SDL_ORIGINAL/stac_copy' from 'service-analytics' / '/collaboration/SDL_ORIGINAL/stac_copy' [2024/04/05-15:51:45.159] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.datasets.fsbased] T-7LD9R0gY - [ct: 3784] FS Provider built [2024/04/05-15:51:45.159] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.azure] T-7LD9R0gY - Using access token cache key: '6d1ad370-59dd-43ff-b215-09424e39101e [client secret] ' https://storage.azure.com/ (global) [2024/04/05-15:51:45.159] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.azure] T-7LD9R0gY - Cached OAuth2 access token found [2024/04/05-15:51:45.160] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 3785] enumerateFS prefix=/installation/ path=/service-analytics/collaboration/SDL_ORIGINAL/stac_copy/installation [2024/04/05-15:51:45.160] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 3785] Listing blobs with path: service-analytics/collaboration/SDL_ORIGINAL/stac_copy/installation [2024/04/05-15:51:45.179] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 3804] Done Azure enumeration nb_files=1 total_size=1715495 [2024/04/05-15:51:45.179] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.input.push] T-7LD9R0gY - USTP: push selection.method=FULL records=1000 ratio=0.05 col=MarketName [2024/04/05-15:51:45.179] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 3804] Extractor run: limit={"maxBytes":-1,"maxRecords":-1,"ordering":{"enabled":false,"rules":[]}} totalRecords=0 [2024/04/05-15:51:45.193] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/installation/out-s0.csv.gz** [2024/04/05-15:51:45.205] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/installation/out-s0.csv.gz** [2024/04/05-15:51:45.205] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 3830] Start compressed [GZIP] stream: /installation/out-s0.csv.gz / totalRecsBefore=0 [2024/04/05-15:51:45.205] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/installation/out-s0.csv.gz** [2024/04/05-15:51:45.205] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/installation/out-s0.csv.gz** [2024/04/05-15:51:45.754] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 4379] after stream totalComp=1715495 totalUncomp=8746441 totalRec=39766 [2024/04/05-15:51:45.754] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 4379] Extractor run done, totalCompressed=1715495 totalRecords=39766 [2024/04/05-15:51:45.757] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 4382] Checking if splits are up to date. Policy: type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=installation),r=0.8,s=1337, instance id: b963b312a4d7fe28de3cd21862dacbb6-5-part-installation [2024/04/05-15:51:45.758] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 4383] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=installation),r=0.8,s=1337 i=b963b312a4d7fe28de3cd21862dacbb6-5-part-installation [2024/04/05-15:51:45.768] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 4393] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=installation),r=0.8,s=1337 i=b963b312a4d7fe28de3cd21862dacbb6-5-part-installation [2024/04/05-15:51:45.778] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 4403] Checking if splits are up to date. Policy: type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=installation),r=0.8,s=1337, instance id: b963b312a4d7fe28de3cd21862dacbb6-5-part-installation [2024/04/05-15:51:45.778] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 4403] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=installation),r=0.8,s=1337 i=b963b312a4d7fe28de3cd21862dacbb6-5-part-installation [2024/04/05-15:51:45.788] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 4413] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=installation),r=0.8,s=1337 i=b963b312a4d7fe28de3cd21862dacbb6-5-part-installation [2024/04/05-15:51:45.798] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 4423] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=breakfixsupport),r=0.8,s=1337 i=bd1cbf16094f7c116e306f7b899fb059-5-part-breakfixsupport [2024/04/05-15:51:45.809] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.datasets.azure] T-7LD9R0gY - [ct: 4434] Building Azure handler config: {"container":"innovation-services","baseBlockID":0,"metastoreSynchronizationEnabled":true,"metastoreTableName":"stac_copy","connection":"ebstgebiadls","path":"/collaboration/SDL_ORIGINAL/stac_copy","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}} [2024/04/05-15:51:45.810] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.datasets.fsbased] T-7LD9R0gY - [ct: 4435] Get filter splits for com.dataiku.dip.input.filter.InputFilter@1d166a2b [2024/04/05-15:51:45.810] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.datasets.bloblike] T-7LD9R0gY - [ct: 4435] Enumerating blob-like dataset SDL_ORIGINAL.stac_copy prefix=/breakfixsupport/ [2024/04/05-15:51:45.810] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.datasets.fsbased] T-7LD9R0gY - [ct: 4435] Building FS provider for dataset handler: SDL_ORIGINAL.stac_copy [2024/04/05-15:51:45.811] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 4436] Effective root : '/service-analytics/collaboration/SDL_ORIGINAL/stac_copy' from 'service-analytics' / '/collaboration/SDL_ORIGINAL/stac_copy' [2024/04/05-15:51:45.811] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.datasets.fsbased] T-7LD9R0gY - [ct: 4436] FS Provider built [2024/04/05-15:51:45.811] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.azure] T-7LD9R0gY - Using access token cache key: '6d1ad370-59dd-43ff-b215-09424e39101e [client secret] ' https://storage.azure.com/ (global) [2024/04/05-15:51:45.811] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.azure] T-7LD9R0gY - Cached OAuth2 access token found [2024/04/05-15:51:45.812] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 4437] enumerateFS prefix=/breakfixsupport/ path=/service-analytics/collaboration/SDL_ORIGINAL/stac_copy/breakfixsupport [2024/04/05-15:51:45.812] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 4437] Listing blobs with path: service-analytics/collaboration/SDL_ORIGINAL/stac_copy/breakfixsupport [2024/04/05-15:51:45.830] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 4455] Done Azure enumeration nb_files=1 total_size=3155655 [2024/04/05-15:51:45.830] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.input.push] T-7LD9R0gY - USTP: push selection.method=FULL records=1000 ratio=0.05 col=MarketName [2024/04/05-15:51:45.830] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 4455] Extractor run: limit={"maxBytes":-1,"maxRecords":-1,"ordering":{"enabled":false,"rules":[]}} totalRecords=0 [2024/04/05-15:51:45.845] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/breakfixsupport/out-s0.csv.gz** [2024/04/05-15:51:45.857] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/breakfixsupport/out-s0.csv.gz** [2024/04/05-15:51:45.857] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 4482] Start compressed [GZIP] stream: /breakfixsupport/out-s0.csv.gz / totalRecsBefore=0 [2024/04/05-15:51:45.857] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/breakfixsupport/out-s0.csv.gz** [2024/04/05-15:51:45.858] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/breakfixsupport/out-s0.csv.gz** [2024/04/05-15:51:46.805] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 5430] after stream totalComp=3155655 totalUncomp=13576652 totalRec=62728 [2024/04/05-15:51:46.805] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 5430] Extractor run done, totalCompressed=3155655 totalRecords=62728 [2024/04/05-15:51:46.808] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 5433] Checking if splits are up to date. Policy: type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=breakfixsupport),r=0.8,s=1337, instance id: bd1cbf16094f7c116e306f7b899fb059-5-part-breakfixsupport [2024/04/05-15:51:46.808] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 5433] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=breakfixsupport),r=0.8,s=1337 i=bd1cbf16094f7c116e306f7b899fb059-5-part-breakfixsupport [2024/04/05-15:51:46.818] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 5443] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=breakfixsupport),r=0.8,s=1337 i=bd1cbf16094f7c116e306f7b899fb059-5-part-breakfixsupport [2024/04/05-15:51:46.828] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 5453] Checking if splits are up to date. Policy: type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=breakfixsupport),r=0.8,s=1337, instance id: bd1cbf16094f7c116e306f7b899fb059-5-part-breakfixsupport [2024/04/05-15:51:46.828] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 5453] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=breakfixsupport),r=0.8,s=1337 i=bd1cbf16094f7c116e306f7b899fb059-5-part-breakfixsupport [2024/04/05-15:51:46.837] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 5462] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=breakfixsupport),r=0.8,s=1337 i=bd1cbf16094f7c116e306f7b899fb059-5-part-breakfixsupport [2024/04/05-15:51:46.848] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 5473] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=supply),r=0.8,s=1337 i=06b2a2dbf0189fa780cbc776dca74d56-5-part-supply [2024/04/05-15:51:46.859] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.datasets.azure] T-7LD9R0gY - [ct: 5484] Building Azure handler config: {"container":"innovation-services","baseBlockID":0,"metastoreSynchronizationEnabled":true,"metastoreTableName":"stac_copy","connection":"ebstgebiadls","path":"/collaboration/SDL_ORIGINAL/stac_copy","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}} [2024/04/05-15:51:46.859] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.datasets.fsbased] T-7LD9R0gY - [ct: 5484] Get filter splits for com.dataiku.dip.input.filter.InputFilter@17e77f69 [2024/04/05-15:51:46.860] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.datasets.bloblike] T-7LD9R0gY - [ct: 5485] Enumerating blob-like dataset SDL_ORIGINAL.stac_copy prefix=/supply/ [2024/04/05-15:51:46.860] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.datasets.fsbased] T-7LD9R0gY - [ct: 5485] Building FS provider for dataset handler: SDL_ORIGINAL.stac_copy [2024/04/05-15:51:46.860] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 5485] Effective root : '/service-analytics/collaboration/SDL_ORIGINAL/stac_copy' from 'service-analytics' / '/collaboration/SDL_ORIGINAL/stac_copy' [2024/04/05-15:51:46.861] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.datasets.fsbased] T-7LD9R0gY - [ct: 5486] FS Provider built [2024/04/05-15:51:46.861] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.azure] T-7LD9R0gY - Using access token cache key: '6d1ad370-59dd-43ff-b215-09424e39101e [client secret] ' https://storage.azure.com/ (global) [2024/04/05-15:51:46.861] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.azure] T-7LD9R0gY - Cached OAuth2 access token found [2024/04/05-15:51:46.861] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 5486] enumerateFS prefix=/supply/ path=/service-analytics/collaboration/SDL_ORIGINAL/stac_copy/supply [2024/04/05-15:51:46.861] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 5486] Listing blobs with path: service-analytics/collaboration/SDL_ORIGINAL/stac_copy/supply [2024/04/05-15:51:46.880] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 5505] Done Azure enumeration nb_files=1 total_size=14265165 [2024/04/05-15:51:46.880] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.input.push] T-7LD9R0gY - USTP: push selection.method=FULL records=1000 ratio=0.05 col=MarketName [2024/04/05-15:51:46.881] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 5506] Extractor run: limit={"maxBytes":-1,"maxRecords":-1,"ordering":{"enabled":false,"rules":[]}} totalRecords=0 [2024/04/05-15:51:46.894] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/supply/out-s0.csv.gz** [2024/04/05-15:51:46.905] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/supply/out-s0.csv.gz** [2024/04/05-15:51:46.906] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 5531] Start compressed [GZIP] stream: /supply/out-s0.csv.gz / totalRecsBefore=0 [2024/04/05-15:51:46.906] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/supply/out-s0.csv.gz** [2024/04/05-15:51:46.906] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/supply/out-s0.csv.gz** [2024/04/05-15:51:48.468] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format.csv] T-7LD9R0gY - [ct: 7093] CSV Emitted 100000 lines from file, 20 columns - interned: 50477 MEM: 24.6826171875% [2024/04/05-15:51:49.894] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format.csv] T-7LD9R0gY - [ct: 8519] CSV Emitted 200000 lines from file, 20 columns - interned: 100732 MEM: 24.6826171875% [2024/04/05-15:51:50.926] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 9551] after stream totalComp=14265165 totalUncomp=58383153 totalRec=271355 [2024/04/05-15:51:50.927] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 9552] Extractor run done, totalCompressed=14265165 totalRecords=271355 [2024/04/05-15:51:50.929] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 9554] Checking if splits are up to date. Policy: type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=supply),r=0.8,s=1337, instance id: 06b2a2dbf0189fa780cbc776dca74d56-5-part-supply [2024/04/05-15:51:50.930] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 9555] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=supply),r=0.8,s=1337 i=06b2a2dbf0189fa780cbc776dca74d56-5-part-supply [2024/04/05-15:51:50.940] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 9565] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=supply),r=0.8,s=1337 i=06b2a2dbf0189fa780cbc776dca74d56-5-part-supply [2024/04/05-15:51:50.949] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 9574] Checking if splits are up to date. Policy: type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=supply),r=0.8,s=1337, instance id: 06b2a2dbf0189fa780cbc776dca74d56-5-part-supply [2024/04/05-15:51:50.949] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 9574] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=supply),r=0.8,s=1337 i=06b2a2dbf0189fa780cbc776dca74d56-5-part-supply [2024/04/05-15:51:50.959] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 9584] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=supply),r=0.8,s=1337 i=06b2a2dbf0189fa780cbc776dca74d56-5-part-supply [2024/04/05-15:51:50.971] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 9596] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=accessory),r=0.8,s=1337 i=e5e999b9803e27ac083f8586a7564463-5-part-accessory [2024/04/05-15:51:50.982] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.datasets.azure] T-7LD9R0gY - [ct: 9607] Building Azure handler config: {"container":"innovation-services","baseBlockID":0,"metastoreSynchronizationEnabled":true,"metastoreTableName":"stac_copy","connection":"ebstgebiadls","path":"/collaboration/SDL_ORIGINAL/stac_copy","notReadyIfEmpty":false,"filesSelectionRules":{"mode":"ALL","excludeRules":[],"includeRules":[],"explicitFiles":[]}} [2024/04/05-15:51:50.983] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.datasets.fsbased] T-7LD9R0gY - [ct: 9608] Get filter splits for com.dataiku.dip.input.filter.InputFilter@1824ae4b [2024/04/05-15:51:50.983] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.datasets.bloblike] T-7LD9R0gY - [ct: 9608] Enumerating blob-like dataset SDL_ORIGINAL.stac_copy prefix=/accessory/ [2024/04/05-15:51:50.983] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.datasets.fsbased] T-7LD9R0gY - [ct: 9608] Building FS provider for dataset handler: SDL_ORIGINAL.stac_copy [2024/04/05-15:51:50.984] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 9609] Effective root : '/service-analytics/collaboration/SDL_ORIGINAL/stac_copy' from 'service-analytics' / '/collaboration/SDL_ORIGINAL/stac_copy' [2024/04/05-15:51:50.984] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.datasets.fsbased] T-7LD9R0gY - [ct: 9609] FS Provider built [2024/04/05-15:51:50.984] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.azure] T-7LD9R0gY - Using access token cache key: '6d1ad370-59dd-43ff-b215-09424e39101e [client secret] ' https://storage.azure.com/ (global) [2024/04/05-15:51:50.984] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.azure] T-7LD9R0gY - Cached OAuth2 access token found [2024/04/05-15:51:50.984] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 9609] enumerateFS prefix=/accessory/ path=/service-analytics/collaboration/SDL_ORIGINAL/stac_copy/accessory [2024/04/05-15:51:50.985] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 9610] Listing blobs with path: service-analytics/collaboration/SDL_ORIGINAL/stac_copy/accessory [2024/04/05-15:51:51.001] [FT-TrainWorkThread-HhiinFh0-110689] [DEBUG] [dku.fsproviders.azure] T-7LD9R0gY - [ct: 9626] Done Azure enumeration nb_files=1 total_size=6397685 [2024/04/05-15:51:51.002] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.input.push] T-7LD9R0gY - USTP: push selection.method=FULL records=1000 ratio=0.05 col=MarketName [2024/04/05-15:51:51.002] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 9627] Extractor run: limit={"maxBytes":-1,"maxRecords":-1,"ordering":{"enabled":false,"rules":[]}} totalRecords=0 [2024/04/05-15:51:51.015] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/accessory/out-s0.csv.gz** [2024/04/05-15:51:51.027] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/accessory/out-s0.csv.gz** [2024/04/05-15:51:51.027] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 9652] Start compressed [GZIP] stream: /accessory/out-s0.csv.gz / totalRecsBefore=0 [2024/04/05-15:51:51.028] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/accessory/out-s0.csv.gz** [2024/04/05-15:51:51.028] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku] T-7LD9R0gY - getCompression filename=**service-analytics/collaboration/SDL_ORIGINAL/stac_copy/accessory/out-s0.csv.gz** [2024/04/05-15:51:52.529] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format.csv] T-7LD9R0gY - [ct: 11154] CSV Emitted 100000 lines from file, 20 columns - interned: 44691 MEM: 24.6826171875% [2024/04/05-15:51:52.911] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 11536] after stream totalComp=6397685 totalUncomp=27468903 totalRec=128601 [2024/04/05-15:51:52.911] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.format] T-7LD9R0gY - [ct: 11536] Extractor run done, totalCompressed=6397685 totalRecords=128601 [2024/04/05-15:51:52.914] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 11539] Checking if splits are up to date. Policy: type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=accessory),r=0.8,s=1337, instance id: e5e999b9803e27ac083f8586a7564463-5-part-accessory [2024/04/05-15:51:52.914] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 11539] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=accessory),r=0.8,s=1337 i=e5e999b9803e27ac083f8586a7564463-5-part-accessory [2024/04/05-15:51:52.924] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 11549] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=accessory),r=0.8,s=1337 i=e5e999b9803e27ac083f8586a7564463-5-part-accessory [2024/04/05-15:51:52.935] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 11560] Checking if splits are up to date. Policy: type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=accessory),r=0.8,s=1337, instance id: e5e999b9803e27ac083f8586a7564463-5-part-accessory [2024/04/05-15:51:52.935] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 11560] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=accessory),r=0.8,s=1337 i=e5e999b9803e27ac083f8586a7564463-5-part-accessory [2024/04/05-15:51:52.946] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.splits] T-7LD9R0gY - [ct: 11571] Search for split: p=type=SPLIT_SINGLE_DATASET,split=RANDOM,splitBeforePrepare=true,ds=stac_copy,sel=(method=full,parts=accessory),r=0.8,s=1337 i=e5e999b9803e27ac083f8586a7564463-5-part-accessory [2024/04/05-15:51:52.962] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.ml] T-7LD9R0gY - [ct: 11587] Locking model train info file /data/dss_data/analysis-data/SDL_ORIGINAL/NE9biEL2/7LD9R0gY/sessions/s23/pp1-base/m1/train_info.json [2024/04/05-15:51:52.964] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.ml] T-7LD9R0gY - [ct: 11589] Unlocking model train info file /data/dss_data/analysis-data/SDL_ORIGINAL/NE9biEL2/7LD9R0gY/sessions/s23/pp1-base/m1/train_info.json [2024/04/05-15:51:52.964] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.prediction.strat.prns] T-7LD9R0gY - [ct: 11589] Launching the training threads [2024/04/05-15:51:52.967] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.ml.python] T-7LD9R0gY - [ct: 11592] Joining processing thread ... [2024/04/05-15:52:53.166] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.ml.python] T-7LD9R0gY - [ct: 71791] Processing thread joined ... [2024/04/05-15:52:53.167] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis] T-7LD9R0gY - [ct: 71792] Train done [2024/04/05-15:52:53.167] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.prediction] T-7LD9R0gY - [ct: 71792] Train done [2024/04/05-15:52:53.177] [FT-TrainWorkThread-HhiinFh0-110689] [INFO] [dku.analysis.trainingdetails] T-7LD9R0gY - Publishing mltask-train-done reflected event