Sign up for your FREE personalized newsletter featuring insights, trends, and news for America's Active Baby Boomers

Newsletter
New

In Bigquery, Check If A Substring Of One Column Is In Another Column (or Across Tables)

Card image cap

The problem is trying to match MCC values ("Merchant Category Codes") based not on the code but the business name, where the names are from two sources and thus not identical. The goal is to check how often the MCC matches or does not match.

For instance, Home Depot may be listed as 'Home Depot', 'The Home Depot', or 'Home Depot #276".

As a first try I am pulling the first 5 characters, then trying to match that substring from one data source to the 'master' or at least cleaner data source.

But at the end of the day, I need to check if one column is contained in the other column.

I have tried contains_substr, which only allows to match with a string literal so i can't use the column, and I've been trying regexp_contains, which throws an error:

"Cannot parse regular expression: no argument for repetition operator: *"

I've tried multiple variations of code like this: select * from space.linking_t.MCCs_join_test where REGEXP_CONTAINS(entity_name, CONCAT(r"(?i)", merch_name_5, r"",''))

Where 'entity_name' is the master to match to and 'merch_name_5' is the substring.


Recent