Dears I need help I have log file as text file Contains about 500 line I need to count the numbers

Ahmedte1234Ahmedte1234 Member Posts: 3 Contributor I
edited April 2020 in Help
The lines on file as 
1 Jan 10:00 the chassis normal status 
1 Jan 10:30 log I'd lost 
1 Jan 12:30 interface down 
1 Jan 1:00 power off system 
2 Jan 11:00 the high temperature 
2 Jan 2:00 the user log in successfully 
And alot of statements like that so some statements useful and some statements no. 
So the output like that 
Down appear 10 times 
Power off appear 1 time 
Interface down 3 times 
And I need the algorithm to suggest the most words and how many appear in file. And also how to reduce with certain pattern.

Best Answer


  • Options
    varunm1varunm1 Moderator, Member Posts: 1,207 Unicorn
    edited April 2019
    Hello @Ahmedte1234

    First, install " text processing" and "Web mining" extensions from marketplace in rapidminer. To count the repetition of words in your document, you first need to read your text file into RapidMiner. Then you can use the below XML code (click on show) to extract details about your data attach your text file instead of the one in this XML. To use this XML, you first need to copy the XML code from here and then open a new blank process in rapidminer, you need to enable XML window by going to VIEW --> Show Panel --> XML in menu bar of RapidMiner. Copy the code from here and paste it in XML window of rapidminer new process, then click the green tick mark which will show you the process as seen in below figure. Once you get this delete the retrieve files and attach your file imported into rapidminer. I also attached the result of the process based on some data you provided. The term occurances is giving you the number of times the word is repeated in your file. There are multiple community samples as well to understand how TF-IDF works 

    <?xml version="1.0" encoding="UTF-8"?><process version="9.2.001">
    <operator activated="true" class="process" compatibility="9.2.001" expanded="true" name="Process">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="9.2.001" expanded="true" height="68" name="Retrieve files" width="90" x="112" y="85">
    <parameter key="repository_entry" value="//Local Repository/RapidMIner/files"/>
    <operator activated="true" class="nominal_to_text" compatibility="9.2.001" expanded="true" height="82" name="Nominal to Text" width="90" x="246" y="136">
    <parameter key="attribute_filter_type" value="all"/>
    <parameter key="attribute" value=""/>
    <parameter key="attributes" value=""/>
    <parameter key="use_except_expression" value="false"/>
    <parameter key="value_type" value="nominal"/>
    <parameter key="use_value_type_exception" value="false"/>
    <parameter key="except_value_type" value="file_path"/>
    <parameter key="block_type" value="single_value"/>
    <parameter key="use_block_type_exception" value="false"/>
    <parameter key="except_block_type" value="single_value"/>
    <parameter key="invert_selection" value="false"/>
    <parameter key="include_special_attributes" value="false"/>
    <operator activated="true" class="text:process_document_from_data" compatibility="8.1.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="447" y="136">
    <parameter key="create_word_vector" value="true"/>
    <parameter key="vector_creation" value="TF-IDF"/>
    <parameter key="add_meta_information" value="true"/>
    <parameter key="keep_text" value="false"/>
    <parameter key="prune_method" value="none"/>
    <parameter key="prune_below_percent" value="3.0"/>
    <parameter key="prune_above_percent" value="30.0"/>
    <parameter key="prune_below_rank" value="0.05"/>
    <parameter key="prune_above_rank" value="0.95"/>
    <parameter key="datamanagement" value="double_sparse_array"/>
    <parameter key="data_management" value="auto"/>
    <parameter key="select_attributes_and_weights" value="false"/>
    <list key="specify_weights"/>
    <process expanded="true">
    <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="179" y="85">
    <parameter key="mode" value="non letters"/>
    <parameter key="characters" value=".:"/>
    <parameter key="language" value="English"/>
    <parameter key="max_token_length" value="3"/>
    <connect from_port="document" to_op="Tokenize" to_port="document"/>
    <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    <connect from_op="Retrieve files" from_port="output" to_op="Nominal to Text" to_port="example set input"/>
    <connect from_op="Nominal to Text" from_port="example set output" to_op="Process Documents from Data" to_port="example set"/>
    <connect from_op="Process Documents from Data" from_port="example set" to_port="result 1"/>
    <connect from_op="Process Documents from Data" from_port="word list" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>

    Hope this helps. Please inform if you are looking for a different thing.

    Be Safe. Follow precautions and Maintain Social Distancing

  • Options
    Ahmedte1234Ahmedte1234 Member Posts: 3 Contributor I
    good but in my research I need to use algorithm like apriori or FPgrowth algorithm
Sign In or Register to comment.