TFIDF per Class/Label ?

CaptainChaosCaptainChaos Member Posts: 17 Contributor II
edited November 2018 in Help
Hi Guys,
The Process posted below is the on I am working with.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.000">
  <operator activated="true" class="process" compatibility="5.2.000" expanded="true" name="Process">
    <process expanded="true" height="528" width="648">
      <operator activated="true" class="read_excel" compatibility="5.2.000" expanded="true" height="60" name="Read Excel" width="90" x="45" y="75">
        <parameter key="excel_file" value="C:\Dokumente und Einstellungen\rrojas\My Documents\myData\Spreadsheetversion1.0Forum.xls"/>
        <parameter key="imported_cell_range" value="A1:AG1500"/>
        <parameter key="first_row_as_names" value="false"/>
        <list key="annotations">
          <parameter key="0" value="Name"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="Productnumber.true.integer.id"/>
          <parameter key="1" value="Class.true.binominal.label"/>
          <parameter key="2" value="Error 1.true.integer.attribute"/>
          <parameter key="3" value="Error 2.true.integer.attribute"/>
          <parameter key="4" value="Error 3.true.integer.attribute"/>
          <parameter key="5" value="Error 4.true.integer.attribute"/>
          <parameter key="6" value="Error 5.true.integer.attribute"/>
          <parameter key="7" value="Error 6.true.integer.attribute"/>
          <parameter key="8" value="Error 7.true.integer.attribute"/>
          <parameter key="9" value="Error 8.true.integer.attribute"/>
          <parameter key="10" value="Error 9.true.integer.attribute"/>
          <parameter key="11" value="Error 10.true.integer.attribute"/>
          <parameter key="12" value="Error 11.true.integer.attribute"/>
          <parameter key="13" value="Error 12.true.integer.attribute"/>
          <parameter key="14" value="Error 13.true.integer.attribute"/>
          <parameter key="15" value="Error 14.true.integer.attribute"/>
          <parameter key="16" value="Error 15.true.integer.attribute"/>
          <parameter key="17" value="Error 16.true.integer.attribute"/>
          <parameter key="18" value="Error 17.true.integer.attribute"/>
          <parameter key="19" value="Error 18.true.integer.attribute"/>
          <parameter key="20" value="Error 19.true.integer.attribute"/>
          <parameter key="21" value="Error 20.true.integer.attribute"/>
          <parameter key="22" value="Error 21.true.integer.attribute"/>
          <parameter key="23" value="Error 22.true.integer.attribute"/>
          <parameter key="24" value="Error 23.true.integer.attribute"/>
          <parameter key="25" value="Error 24.true.integer.attribute"/>
          <parameter key="26" value="Error 25.true.integer.attribute"/>
          <parameter key="27" value="Error 26.true.integer.attribute"/>
          <parameter key="28" value="Error 27.true.integer.attribute"/>
          <parameter key="29" value="Error 28.true.integer.attribute"/>
          <parameter key="30" value="Error 29.true.integer.attribute"/>
          <parameter key="31" value="Error 30.true.integer.attribute"/>
          <parameter key="32" value="Total Errors.true.integer.attribute"/>
      <operator activated="true" class="generate_tfidf" compatibility="5.2.000" expanded="true" height="76" name="Generate TFIDF" width="90" x="246" y="165"/>
      <operator activated="true" class="data_to_similarity" compatibility="5.2.000" expanded="true" height="76" name="Data to Similarity" width="90" x="380" y="75">
        <parameter key="measure_types" value="NumericalMeasures"/>
        <parameter key="numerical_measure" value="CosineSimilarity"/>
      <connect from_op="Read Excel" from_port="output" to_op="Generate TFIDF" to_port="example set input"/>
      <connect from_op="Generate TFIDF" from_port="example set output" to_op="Data to Similarity" to_port="example set"/>
      <connect from_op="Data to Similarity" from_port="similarity" to_port="result 2"/>
      <connect from_op="Data to Similarity" from_port="example set" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>

First I will describe my data. The Set consists of 1500 examples each presenting a productnr (id of the product). The Products (examples) are grouped in two different Classes/Labels. Like you can see in my Process i calculated the TFIDF score for every Attribute(Error1...Errorx) for every example (product).
In addition  I would like to  know to more  things:

1. Is it possible to calculate some kind of "TFIDF Score" no just for a seperated example but for a whole class/label so that i know what attribute is very characteristic for a label/class.

2. I would like to findout what combination of attributes is significant for a Label/Class.
2a. I would like to find out which correlation exists between attributes related to class. So that you can make assumptions like if an example has a certain combeination of attributes it belongs to certain class.

Thanks in advance for your time any help is really appreciated.

Sign In or Register to comment.