0 votes
Is it somehow possible to oversample my dataset?

for example, I have such records and target variables

1 2 3 | 5
2 2 3 | 6
1 1 1 | 1
3 2 2 | 5

I want to duplicate (or generate more than one duplicate)  row #3 and make my dataset looks as follows:
1 2 3 | 5
2 2 3 | 6
1 1 1 | 1
1 1 1 | 1
3 2 2 | 5

How can I do this?
Thank you in advance!
asked by Sergey
It's not very clear from your example on which criterion you want to oversample ? Is it based on the counts in the target variable ?
Strictly speaking - yes. It is based on the counts in the target. When I have multiple records that lies between 5-6 and small number of records out of this range, I need to oversample the training set and make it more balanced. Especially when it is needed to found some reasons which lead to rare target values (in this case - 1).

Please log in or register to answer this question.

365 questions
392 answers
224 comments
230 users