Sentence extraction

sachinmphasissachinmphasis Member Posts: 4 Contributor I
edited November 2018 in Help

how to extract a sentence using a keyword with help of operators?


  • Options
    Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Are you referring to Tokenization? The Tokenize operator lets you tokenize based on "Linguistic Sentences." Just select that in the paramter window.

  • Options
    aruberutouaruberutou Member Posts: 23 Contributor II

    Maybe he's referring to the "extract information" operator?


    You have to place it inside of a "process documents" operator, to feed it your documents. Once there, select your extraction options and run. Just make sure that "add meta information" is checked. Here's a sample process.


    <?xml version="1.0" encoding="UTF-8"?><process version="7.2.000">
    <operator activated="true" class="process" compatibility="7.2.000" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="generate_nominal_data" compatibility="7.2.000" expanded="true" height="68" name="Generate Nominal Data" width="90" x="112" y="85"/>
    <operator activated="true" class="nominal_to_text" compatibility="7.2.000" expanded="true" height="82" name="Nominal to Text" width="90" x="246" y="85"/>
    <operator activated="true" class="text:process_document_from_data" compatibility="7.2.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="380" y="85">
    <parameter key="create_word_vector" value="false"/>
    <parameter key="keep_text" value="true"/>
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:extract_information" compatibility="7.2.000" expanded="true" height="68" name="Extract Information" width="90" x="179" y="34">
    <list key="string_machting_queries">
    <parameter key="test" value="value1.value3"/>
    <list key="regular_expression_queries"/>
    <list key="regular_region_queries"/>
    <list key="xpath_queries"/>
    <list key="namespaces"/>
    <list key="index_queries"/>
    <list key="jsonpath_queries"/>
    <description align="center" color="green" colored="true" width="126">Define attribute names here</description>
    <connect from_port="document" to_op="Extract Information" to_port="document"/>
    <connect from_op="Extract Information" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    <connect from_op="Generate Nominal Data" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
    <connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
    <connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>

    Hope that helps!

  • Options
    sachinmphasissachinmphasis Member Posts: 4 Contributor I



    Thanks frnds for your input, i actually used cut document operator and in it specified regular expression as "([^.]*?apple[^.]*\.)" and was able to extract the sentence.


    Thanks and Regards,


  • Options
    bhupendra_patilbhupendra_patil Administrator, Employee, Member Posts: 168 RM Data Scientist

    leaving this here for future users,

    Here is a KB article describing other techniques



    Are you working with Hadoop. Radoop is free now!! Try it here http://bit.ly/RadoopDL

Sign In or Register to comment.