Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"Filter text on regex."
I want to find all text snippets containing 1 or several words via regex. if I write select Filter Examples, and set it to "Expression" and provide it with: finds(Text, "(?i)\blootbox|micro\b") it doesn't work, although it is syntactically correct.
If I remove |micro, it only returns all snippts that contain lootbox - why does it not return an example with one of them? If I use RapidMiner's regex checker on some dummy data it works with the match on both of them, just not with "Filter Examples".
Kindly help!
Tagged:
0
Answers
Well, this expression actually IS an exact match ;-)
So I assume you would like to only match if there is a non-word character before and after? Is that what you mean? In this case, the correct expression is finds([Text],".*\\W+lootbox\\W+.*|.*\\W+micro\\W+.*") - process below.
Please note however that in this case you would also no longer find plurals easily, so for example "lootboxes" would not trigger this any longer.
Cheers,
Ingo
<?xml version="1.0" encoding="UTF-8"?><process version="9.2.000"><br> <context><br> <input/><br> <output/><br> <macros/><br> </context><br> <operator activated="true" class="process" compatibility="9.2.000" expanded="true" name="Process"><br> <parameter key="logverbosity" value="init"/><br> <parameter key="random_seed" value="2001"/><br> <parameter key="send_mail" value="never"/><br> <parameter key="notification_email" value=""/><br> <parameter key="process_duration_for_mail" value="30"/><br> <parameter key="encoding" value="UTF-8"/><br> <process expanded="true"><br> <operator activated="true" class="utility:create_exampleset" compatibility="9.2.000" expanded="true" height="68" name="Create ExampleSet" width="90" x="45" y="34"><br> <parameter key="generator_type" value="comma separated text"/><br> <parameter key="number_of_examples" value="100"/><br> <parameter key="use_stepsize" value="false"/><br> <list key="function_descriptions"/><br> <parameter key="add_id_attribute" value="false"/><br> <list key="numeric_series_configuration"/><br> <list key="date_series_configuration"/><br> <list key="date_series_configuration (interval)"/><br> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/><br> <parameter key="time_zone" value="SYSTEM"/><br> <parameter key="input_csv_text" value="Text This is a text about lootboxes This is a text about micro transactions This is a text about lootboxes and micro transactions And this is a text talking about other things"/><br> <parameter key="column_separator" value=","/><br> <parameter key="parse_all_as_nominal" value="false"/><br> <parameter key="decimal_point_character" value="."/><br> <parameter key="trim_attribute_names" value="true"/><br> </operator><br> <operator activated="true" class="filter_examples" compatibility="9.2.000" expanded="true" height="103" name="Filter Examples" width="90" x="179" y="34"><br> <parameter key="parameter_expression" value="finds([Text],".*\\W+lootbox\\W+.*|.*\\W+micro\\W+.*")"/><br> <parameter key="condition_class" value="expression"/><br> <parameter key="invert_filter" value="false"/><br> <list key="filters_list"/><br> <parameter key="filters_logic_and" value="true"/><br> <parameter key="filters_check_metadata" value="true"/><br> </operator><br> <connect from_op="Create ExampleSet" from_port="output" to_op="Filter Examples" to_port="example set input"/><br> <connect from_op="Filter Examples" from_port="example set output" to_port="result 1"/><br> <portSpacing port="source_input 1" spacing="0"/><br> <portSpacing port="sink_result 1" spacing="0"/><br> <portSpacing port="sink_result 2" spacing="0"/><br> </process><br> </operator><br></process>