The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.

"[solved] struggling with word list feature"

sanasana Member Posts: 2 Contributor I
edited June 2019 in Help

Can anyone help me please. am struggling with the text processing section. am able to tokenize but never get any results as far as creating a word frequency list is concerned. it can't be that difficult as there are lots of preliminary software in the web for calculating word frequency lists.

the only thing that has worked so far for me is the process documents from files command, that too, showcases results of only one of the two directories i chose.

just now i did the process documents command again with a mix of things - tokenize, transform cases and filtering english stopwords - but no output - here is the process flow - i don't have any programming background so just following the way other posts have been filed.

hope someone can help me out here.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.1.014">
 <operator activated="true" class="process" compatibility="5.1.014" expanded="true" name="Process">
   <parameter key="logfile" value="C:\Users\user3\Desktop\dir text\5.txt"/>
   <parameter key="resultfile" value="C:\Users\user3\Desktop\dir text\New Text Document.txt"/>
   <process expanded="true" height="100" width="145">
     <operator activated="true" class="text:process_documents" compatibility="5.1.004" expanded="true" height="76" name="Process Documents" width="90" x="45" y="30">
       <process expanded="true" height="414" width="762">
         <operator activated="true" class="text:tokenize" compatibility="5.1.004" expanded="true" height="60" name="Tokenize" width="90" x="31" y="27"/>
         <operator activated="true" class="text:transform_cases" compatibility="5.1.004" expanded="true" height="60" name="Transform Cases" width="90" x="181" y="26"/>
         <operator activated="true" class="text:filter_stopwords_english" compatibility="5.1.004" expanded="true" height="60" name="Filter Stopwords (English)" width="90" x="345" y="25"/>
         <connect from_port="document" to_op="Tokenize" to_port="document"/>
         <connect from_op="Tokenize" from_port="document" to_op="Transform Cases" to_port="document"/>
         <connect from_op="Transform Cases" from_port="document" to_op="Filter Stopwords (English)" to_port="document"/>
         <connect from_op="Filter Stopwords (English)" from_port="document" to_port="document 1"/>
         <portSpacing port="source_document" spacing="0"/>
         <portSpacing port="sink_document 1" spacing="0"/>
         <portSpacing port="sink_document 2" spacing="0"/>
     <connect from_op="Process Documents" from_port="example set" to_port="result 1"/>
     <connect from_op="Process Documents" from_port="word list" to_port="result 2"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
     <portSpacing port="sink_result 2" spacing="0"/>
     <portSpacing port="sink_result 3" spacing="0"/>


  • Options
    Nils_WoehlerNils_Woehler Member Posts: 463 Maven
    Hi sana,

    is the xml code you have posted the whole process? In that case you aren't getting any results because you aren't providing any documents to "Process Documents" operator.
    You can use  the "Read Document" operator to load documents. Connect it with the "Process Documents" operator and your results shouldn't be empty.

  • Options
    sanasana Member Posts: 2 Contributor I
    Hi Nils,

    Thanks a lot,

    Guess I have to play around a lot right now  :)

    Nice to have people to bank upon, and great work happening here,

    Please do keep it up,

Sign In or Register to comment.