Is there a way to count specific types of characters in a text cell?

UserBird · ‎06-21-2015

I am trying to enrich a dataset containing product names and descriptions and I would like to extract the number of words / letters capitalized and non-capitalized / numbers in certain columns.

Is there any way to do this easily?

Thomas · ‎06-22-2015

Hi Vincent,

One way to do it is to use a Custom Python Script in Analyze. You can easily implement your logic this way. For example, if you want to test for specific values in a string, you could do the following:


import json 

def process(row):
  
  # Initialize counters
  _uppers = 0
  _lowers = 0
  _commas = 0
  _digits = 0
  
  for character in row['name']:
    if character.isupper(): # check for uppercase values
      _uppers = _uppers + 1
    if character.islower(): # check for lowercase values
      _lowers = _lowers + 1
    if character == ',': # check for commas
      _commas = _commas + 1
    if character.isdigit(): # check for numbers
      _digits = _digits + 1
      
  return json.dumps({
    'count_uppercase_values': _uppers,
    'count_lowercase_values': _lowers,
    'count_commas': _commas,
    'count_digits': _digits,
  })

The cool thing is that you output as many counts as you want and pass it to a Flatten JSON processor to create your columns.

View solution in original post

Thomas · ‎06-22-2015

Hi Vincent,

One way to do it is to use a Custom Python Script in Analyze. You can easily implement your logic this way. For example, if you want to test for specific values in a string, you could do the following:


import json 

def process(row):
  
  # Initialize counters
  _uppers = 0
  _lowers = 0
  _commas = 0
  _digits = 0
  
  for character in row['name']:
    if character.isupper(): # check for uppercase values
      _uppers = _uppers + 1
    if character.islower(): # check for lowercase values
      _lowers = _lowers + 1
    if character == ',': # check for commas
      _commas = _commas + 1
    if character.isdigit(): # check for numbers
      _digits = _digits + 1
      
  return json.dumps({
    'count_uppercase_values': _uppers,
    'count_lowercase_values': _lowers,
    'count_commas': _commas,
    'count_digits': _digits,
  })

The cool thing is that you output as many counts as you want and pass it to a Flatten JSON processor to create your columns.

Is there a way to count specific types of characters in a text cell?

Is there a way to count specific types of characters in a text cell?

Labels

Preparation

Visual recipes

Sign up to take part

Is there a way to count specific types of characters in a text cell?

Is there a way to count specific types of characters in a text cell?

Labels

Preparation

Visual recipes