Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Filtering with whitespace regex not working

AKAK Member Posts: 2 Learner III
edited November 2018 in Help

I have nominal attributes Entity and EntityType. Entity contains a number of names of people and organizations. EntityType contains which type the Entity is (Organization or Person). I am trying to filter on EntityType == Person where Entity contains a full name. I want to omit single names.

 

I am using Filter Examples with attribute_value_filter. The string parameter is set as below.

EntityType = PERSON && Entity = [a-zA-Z]*\s.*

This does not select any of the names. I think I have isolated the issue to the whitespace character class within Entity. Any filtering on whitespace fails. For example, all of the below parameters failed to catch Bart Simpson.

Entity = Bart\sSimpson

Entity = Bart[ ]Simpson

Entity = Bart Simpson

 

However, the below worked.

Entity = Bart.+Simpson

 

Any ideas on how to match full names?

Below is an example of how I want my filter to work.

 

Input
Entity EntityType
Fox Network ORGANIZATION
Homer J. Simpson PERSON
Homer PERSON
Marge PERSON
Bart Simpson PERSON
Lisa Simpson PERSON

Desired Output
Entity EntityType
Homer J. Simpson PERSON
Bart Simpson PERSON
Lisa Simpson PERSON

 

Tagged:

Answers

  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn

    I'm finding that Bart\sSimpson works in the data sample you gave.  What if it isn't a single space? 

    Have you tried Bart\s+Simpson for example?

Sign In or Register to comment.