Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

TFIDF per Class/Label ?

CaptainChaosCaptainChaos Member Posts: 17 Contributor II
edited November 2018 in Help
Hi Guys,
The Process posted below is the on I am working with.
 

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.000" expanded="true" name="Process">
    <process expanded="true" height="528" width="648">
      <operator activated="true" class="read_excel" compatibility="5.2.000" expanded="true" height="60" name="Read Excel" width="90" x="45" y="75">
        <parameter key="excel_file" value="C:\Dokumente und Einstellungen\rrojas\My Documents\myData\Spreadsheetversion1.0Forum.xls"/>
        <parameter key="imported_cell_range" value="A1:AG1500"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        </list>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="Productnumber.true.integer.id"/>
          <parameter key="1" value="Class.true.binominal.label"/>
          <parameter key="2" value="Error 1.true.integer.attribute"/>
          <parameter key="3" value="Error 2.true.integer.attribute"/>
          <parameter key="4" value="Error 3.true.integer.attribute"/>
          <parameter key="5" value="Error 4.true.integer.attribute"/>
          <parameter key="6" value="Error 5.true.integer.attribute"/>
          <parameter key="7" value="Error 6.true.integer.attribute"/>
          <parameter key="8" value="Error 7.true.integer.attribute"/>
          <parameter key="9" value="Error 8.true.integer.attribute"/>
          <parameter key="10" value="Error 9.true.integer.attribute"/>
          <parameter key="11" value="Error 10.true.integer.attribute"/>
          <parameter key="12" value="Error 11.true.integer.attribute"/>
          <parameter key="13" value="Error 12.true.integer.attribute"/>
          <parameter key="14" value="Error 13.true.integer.attribute"/>
          <parameter key="15" value="Error 14.true.integer.attribute"/>
          <parameter key="16" value="Error 15.true.integer.attribute"/>
          <parameter key="17" value="Error 16.true.integer.attribute"/>
          <parameter key="18" value="Error 17.true.integer.attribute"/>
          <parameter key="19" value="Error 18.true.integer.attribute"/>
          <parameter key="20" value="Error 19.true.integer.attribute"/>
          <parameter key="21" value="Error 20.true.integer.attribute"/>
          <parameter key="22" value="Error 21.true.integer.attribute"/>
          <parameter key="23" value="Error 22.true.integer.attribute"/>
          <parameter key="24" value="Error 23.true.integer.attribute"/>
          <parameter key="25" value="Error 24.true.integer.attribute"/>
          <parameter key="26" value="Error 25.true.integer.attribute"/>
          <parameter key="27" value="Error 26.true.integer.attribute"/>
          <parameter key="28" value="Error 27.true.integer.attribute"/>
          <parameter key="29" value="Error 28.true.integer.attribute"/>
          <parameter key="30" value="Error 29.true.integer.attribute"/>
          <parameter key="31" value="Error 30.true.integer.attribute"/>
          <parameter key="32" value="Total Errors.true.integer.attribute"/>
        </list>
      </operator>
      <operator activated="true" class="generate_tfidf" compatibility="5.2.000" expanded="true" height="76" name="Generate TFIDF" width="90" x="246" y="165"/>
      <operator activated="true" class="data_to_similarity" compatibility="5.2.000" expanded="true" height="76" name="Data to Similarity" width="90" x="380" y="75">
        <parameter key="measure_types" value="NumericalMeasures"/>
        <parameter key="numerical_measure" value="CosineSimilarity"/>
      </operator>
      <connect from_op="Read Excel" from_port="output" to_op="Generate TFIDF" to_port="example set input"/>
      <connect from_op="Generate TFIDF" from_port="example set output" to_op="Data to Similarity" to_port="example set"/>
      <connect from_op="Data to Similarity" from_port="similarity" to_port="result 2"/>
      <connect from_op="Data to Similarity" from_port="example set" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>


First I will describe my data. The Set consists of 1500 examples each presenting a productnr (id of the product). The Products (examples) are grouped in two different Classes/Labels. Like you can see in my Process i calculated the TFIDF score for every Attribute(Error1...Errorx) for every example (product).
In addition  I would like to  know to more  things:

1. Is it possible to calculate some kind of "TFIDF Score" no just for a seperated example but for a whole class/label so that i know what attribute is very characteristic for a label/class.

2. I would like to findout what combination of attributes is significant for a Label/Class.
2a. I would like to find out which correlation exists between attributes related to class. So that you can make assumptions like if an example has a certain combeination of attributes it belongs to certain class.

Thanks in advance for your time any help is really appreciated.


Sign In or Register to comment.