Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
I succeeded in doing the "join" (actually it is not, but for my data it is). The goal is to tag each record from DATASET1 for being/not being in a specific time period. Time periods are defined in DATASET2.
I did it using Python.
For those interested, here is the script (As i'm not a Python expert, more elegant code can be proposed):
# -*- coding: utf-8 -*-
import dataiku
import pandas as pd, numpy as np
from dataiku import pandasutils as pdu
# Input datasets
DATASET1 = dataiku.Dataset("DATASET1")
DATASET1_df = histo_MAG_JF_ZV.get_dataframe()
DATASET2 = dataiku.Dataset("DATASET2")
DATASET2_df = soldes_prepared.get_dataframe()
# Output datasets
DATASET1_outuput = dataiku.Dataset("DATASET1_outuput")
# Pour chaque ligne du tableau de Soldes
for d in DATASET2.iter_rows():
# FOR DEBUG
#print d
begin_date = d['begin']
end_date = d['end']
cond1 = (DATASET1_df['Date'] >= begin_date)
cond2 = (DATASET1_df['Date'] <= end_date)
matching_records = DATASET1_df[cond1][cond2]
# Add begin/end date to output dataset
matching_records['begin_date'] = begin_date
matching_records['end_date'] = end_date
DATASET1_outuput.write_with_schema(matching_records)
Next I left join DATASET1 with DATASET1_output to recover non mathing records, as the following analysis needs both matching and non matching records.
Geoff