Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Only search for a specific keyword from a text
Hello,
I want to search for a specific keyword from a text and assign them by their type. I am using the Generate Attributes operator and writing a function to search for the keywords. I have this problem, I have words like, "liar", "lies", "lied" in the list. The function expression that I am using picks up words like "families", "familiar" as well. I only want words that has "lies", "liar", not "families" or "familiar".
This was my approach ;
I want to search for a specific keyword from a text and assign them by their type. I am using the Generate Attributes operator and writing a function to search for the keywords. I have this problem, I have words like, "liar", "lies", "lied" in the list. The function expression that I am using picks up words like "families", "familiar" as well. I only want words that has "lies", "liar", not "families" or "familiar".
This was my approach ;
if(matches(Notes,".*lies.*"),"Lies",
if(matches(Notes,".*liar.*"),"Lies",
if(matches(Notes,".*lied.*"),"Lies",
if(matches(Notes,".*lying.*"),"Lying","None"))))
Any help is appreciated. Thanks
Any help is appreciated. Thanks
Tagged:
0
Best Answer
-
MarcoBarradas Administrator, Employee, RapidMiner Certified Analyst, Member Posts: 272 UnicornHello @tahsin
You could use a MAP operator with regex configured in order to replace all the other text on your text attribute. You may want to create a copy of it first.
I'm pasting a process that could help you get there.
Since you are doing some text processing I would recommend going through the Text and Web Mining tutorials at the academy
https://academy.rapidminer.com/learn/course/text-and-web-mining-with-rapidminer/text-and-web-mining/lets-get-started<?xml version="1.0" encoding="UTF-8"?><process version="9.9.002"> <context> <input/> <output/> <macros/> </context> <operator activated="true" class="process" compatibility="9.9.002" expanded="true" name="Process"> <parameter key="logverbosity" value="init"/> <parameter key="random_seed" value="-1"/> <parameter key="send_mail" value="never"/> <parameter key="notification_email" value=""/> <parameter key="process_duration_for_mail" value="30"/> <parameter key="encoding" value="SYSTEM"/> <process expanded="true"> <operator activated="true" class="utility:create_exampleset" compatibility="9.9.002" expanded="true" height="68" name="Create ExampleSet" width="90" x="246" y="85"> <parameter key="generator_type" value="comma separated text"/> <parameter key="number_of_examples" value="100"/> <parameter key="use_stepsize" value="false"/> <list key="function_descriptions"/> <parameter key="add_id_attribute" value="false"/> <list key="numeric_series_configuration"/> <list key="date_series_configuration"/> <list key="date_series_configuration (interval)"/> <parameter key="date_format" value="yyyy-MM-dd HH:mm:ss"/> <parameter key="time_zone" value="SYSTEM"/> <parameter key="input_csv_text" value="Text He said a couple of lies to us The wife pointed he was a liar and that was the reason for it Person lied about the reason he was at that place He was lying all the time"/> <parameter key="column_separator" value=","/> <parameter key="parse_all_as_nominal" value="false"/> <parameter key="decimal_point_character" value="."/> <parameter key="trim_attribute_names" value="true"/> </operator> <operator activated="true" class="generate_copy" compatibility="9.9.002" expanded="true" height="82" name="Generate Copy" width="90" x="380" y="85"> <parameter key="attribute_name" value="Text"/> <parameter key="new_name" value="Type"/> </operator> <operator activated="true" class="map" compatibility="9.9.002" expanded="true" height="82" name="Map" width="90" x="514" y="85"> <parameter key="attribute_filter_type" value="single"/> <parameter key="attribute" value="Type"/> <parameter key="attributes" value=""/> <parameter key="use_except_expression" value="false"/> <parameter key="value_type" value="attribute_value"/> <parameter key="use_value_type_exception" value="false"/> <parameter key="except_value_type" value="time"/> <parameter key="block_type" value="attribute_block"/> <parameter key="use_block_type_exception" value="false"/> <parameter key="except_block_type" value="value_matrix_row_start"/> <parameter key="invert_selection" value="false"/> <parameter key="include_special_attributes" value="false"/> <list key="value_mappings"> <parameter key=".+\b(lies|liar|lied)\b.+" value="Liar"/> <parameter key=".+\b(lying)\b.+" value="Lying"/> </list> <parameter key="consider_regular_expressions" value="true"/> <parameter key="add_default_mapping" value="false"/> </operator> <connect from_op="Create ExampleSet" from_port="output" to_op="Generate Copy" to_port="example set input"/> <connect from_op="Generate Copy" from_port="example set output" to_op="Map" to_port="example set input"/> <connect from_op="Map" from_port="example set output" to_port="result 1"/> <portSpacing port="source_input 1" spacing="0"/> <portSpacing port="sink_result 1" spacing="0"/> <portSpacing port="sink_result 2" spacing="0"/> </process> </operator> </process>
1
Answers
Dortmund, Germany
This is actually how I do it in python,
df['Type'] = np.where(df.Notes.str.contains(r'\b(lies|liar|lied)\b'), 'Lies',
Not sure how to do it in here.