There is currently no way to do that in a visual preparation recipe* (because a visual recipe more or less works row by row, and it cannot work on a full column, as it is designed for big data).
It's possible to do so in a visual GROUP recipe: click “Show mass actions”, select all columns, click “use as grouping keys”. If the csv is very big, I suggest synchronizing to a SQL DB first.
You can also do so in coding recipes:
- In a Python recipe, you can use the Pandas function (see example below) drop_duplicates()
- In a R recipe, you have several alternative (duplicated(), dplyr, ..): read here
- In a SQL recipe, I would use a a group by with min or max, or window function with partition by key and keep the first row.
* There is actually one way to do it in a visual preparation recipe, with a custom Python function, but that will not work all the time (if the recipe is multi-threaded), so I would not recommend this trick:
I hope that helps,