Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

'nominal correllation matrix' from example set

imkeimke Member Posts: 12 Learner III
edited October 2019 in Help

Hello,

I have already done a text mining process and now I have the Example Set (Process Documents from Data) table. With this I want to calculate how often two words occur in the same text. At first I thought I could use Correlation Matrix Operator, but that does not work. So I tryed with Auto Model the Clustring, but for this I can only take two entrys of the example Set and I want to know it from all the words. So I thought maybe I could add the x-Means Operator in my process, but for x-Means my Data Set is a way to big and with k-Means I'm not getting the results I want. (No Correlation Matrix like with Auto Model anymore).

So  my question is: Is there a possibility to create a correlation Matrix with the ExampleSet?

Thank you

Imke

Best Answer

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,528 RM Data Scientist
    Solution Accepted

    Hi @imke,

    it feels to me like this is a case for FP-Growth or for n_grams? See attached example.

     

    BR,

    Martin

     

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.002">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="9.0.002" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="Create Document" width="90" x="112" y="85">
    <parameter key="text" value="one two three"/>
    </operator>
    <operator activated="true" class="text:create_document" compatibility="8.1.000" expanded="true" height="68" name="Create Document (2)" width="90" x="112" y="238">
    <parameter key="text" value="two three four"/>
    </operator>
    <operator activated="true" class="collect" compatibility="9.0.002" expanded="true" height="103" name="Collect" width="90" x="246" y="187"/>
    <operator activated="true" class="text:process_documents" compatibility="8.1.000" expanded="true" height="103" name="Process Documents" width="90" x="380" y="187">
    <parameter key="vector_creation" value="Binary Term Occurrences"/>
    <process expanded="true">
    <operator activated="true" class="text:tokenize" compatibility="8.1.000" expanded="true" height="68" name="Tokenize" width="90" x="380" y="187"/>
    <connect from_port="document" to_op="Tokenize" to_port="document"/>
    <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
    <portSpacing port="source_document" spacing="0"/>
    <portSpacing port="sink_document 1" spacing="0"/>
    <portSpacing port="sink_document 2" spacing="0"/>
    </process>
    <description align="center" color="transparent" colored="false" width="126">Binary term occs</description>
    </operator>
    <operator activated="true" class="numerical_to_binominal" compatibility="9.0.002" expanded="true" height="82" name="Numerical to Binominal" width="90" x="581" y="187"/>
    <operator activated="true" class="concurrency:fp_growth" compatibility="9.0.002" expanded="true" height="82" name="FP-Growth" width="90" x="715" y="187">
    <enumeration key="must_contain_list"/>
    </operator>
    <connect from_op="Create Document" from_port="output" to_op="Collect" to_port="input 1"/>
    <connect from_op="Create Document (2)" from_port="output" to_op="Collect" to_port="input 2"/>
    <connect from_op="Collect" from_port="collection" to_op="Process Documents" to_port="documents 1"/>
    <connect from_op="Process Documents" from_port="example set" to_op="Numerical to Binominal" to_port="example set input"/>
    <connect from_op="Numerical to Binominal" from_port="example set output" to_op="FP-Growth" to_port="example set"/>
    <connect from_op="FP-Growth" from_port="frequent sets" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany

Answers

  • imkeimke Member Posts: 12 Learner III

    Hello Martin,

    that's quite good, but not the right solution for me I think. N-grams are only words which are following themselfs and I want to know, which words are in wich text together, but not directly after the other word. Do you know what I mean?

    Greatings

    Imke

  • imkeimke Member Posts: 12 Learner III

    Hello Martin,

    I need to correct myself. With the right settings FP-Growth is perfect for me!

    Thanks a lot!

    Imke

Sign In or Register to comment.