How to remove rows with all emptly values across all columns

Solved!
bkmyt
Level 2
How to remove rows with all emptly values across all columns

Pretty sure this is a basic question, however I was not able to find a solution so far.



After removing some columns, I ended up with rows that have an empty value in each column (tooltip shows "null"), which I want to remove now.



I tried the following processor ("filter rows/cells on value"):



However, the empty rows are not removed. A counter-check with "Action: Only keep matching rows" removes all rows.



The "Remove rows where cell is empty-processor" in combination with selecting "all" columns removes each row with at least one empty column (which is not what I want) as described in https://doc.dataiku.com/dss/latest/preparation/processors/remove-empty.html.



Am I missing something? Thanks!

0 Kudos
1 Solution
Alex_Combessie
Dataiker Alumni

Hello,



In the meantime, you can use the following trick:



1. Add a Python function step in "cell" mode to output an "all_empty" column, with the following code




def process(row):
all_empty = all(
[str(v).strip() == '' or v is None
for k,v in row.items() if k != 'all_empty']
)
return(all_empty)


2. Add a filter on value step to remove the rows where the "all_empty" indicator column is True



โ€‹



Hope it helps,



Alex

View solution in original post

0 Kudos
5 Replies
Clรฉment_Stenac
Hi,

There are currently some known issues with this processor in "ALL" mode. You'll need to add the columns explicitly for the moment.
bkmyt
Level 2
Author
Thanks for the quick reply.
Unfortunately, nothing happens when adding all the columns explicitly (i.e. no rows are removed). Even when selecting only one column, no rows are removed (agian, counter-check with "Action: only keep matching rows" removes all rows).
0 Kudos
SimonM
Level 2

I'm still seeing the same behaviour unfortunately, however for future readers a simpler option is described here using filter by formula:

https://community.dataiku.com/t5/Using-Dataiku-DSS/Remove-row-if-two-column-are-both-empty/m-p/1285

0 Kudos
Alex_Combessie
Dataiker Alumni

Hello,



In the meantime, you can use the following trick:



1. Add a Python function step in "cell" mode to output an "all_empty" column, with the following code




def process(row):
all_empty = all(
[str(v).strip() == '' or v is None
for k,v in row.items() if k != 'all_empty']
)
return(all_empty)


2. Add a filter on value step to remove the rows where the "all_empty" indicator column is True



โ€‹



Hope it helps,



Alex

0 Kudos
DavidALI
Level 3

Hi, 

I did the following steps : 

- Concat all columns in a new one (named concat_all)

- Delete rows where concat_all is empty

hope this help

Labels

?
Labels (2)
A banner prompting to get Dataiku