Sign up to take part
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Registered users can ask their own questions, contribute to discussions, and be part of the Community!
Hello,
I'm Dataiku and ML beginner, so excuse my (maybe) simple question.
I have a dataset with data on internet companies. Originally it came with ">" and "," separated info on target markets (column: markets). There are some extra columns on eg. #of employees, financing etc to the right.
My goal is to create a model with "activity" as a target variable (it has 3 values: operating, acquired and non-operating). Eg, to identify the most promising markets to "survive", or the most dangerous (causing "non-operation").
My original file had 1 record per company (app. 1 000 companies), with only "markets" column. I started with splitting it, first with ">", and then "," as separators. Finally (after some cleaning and merging) I got the dataset with many records per company, as displayed below, with distinct "market__" features.
My questions:
1. Is it OK for ML model to keep a data on a single company in a form of many records (see picture below)?
2. Is there any other procedure of data preparation (folding, splitting, transformation, etc) You would recommend?
I would greatly appreciate Your help,
Many thanks in advance,
Andy