Options

transform example set to histogram

keyser84keyser84 Member Posts: 9 Contributor II
edited November 2018 in Help

Is there a possibility to work on the histogram of the data (as can be shown by the plotter)?

In particular:
- given an example set with several instances
- discretize values of one attribute in f bins
- return an example set with f attributes (corresponding to the f bins) which contains only one instance (values are instance counts of source example set)

I want to use this to get a representation of an example set and then compare the histogram against histograms of other example sets (e.g. by applying Euklidian or Manhattan distance measure to the histrogram vector).

There are operators like BinDiscretization and Aggregation, but is there another operator which performs exactly what can be shown by histogram plotter?

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    unfortunately there is no operator performing this task in one step, but it's quite easy to use a combination of Discretization and Aggregation as you already suggested. If you combine the operators in the way shown below, it should do the trick.
    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSource" class="ExampleSource">
            <parameter key="attributes" value="C:\Dokumente und Einstellungen\sland\Eigene Dateien\yale\workspace\sample\data\iris.aml"/>
        </operator>
        <operator name="ToHistogram" class="OperatorChain" expanded="yes">
            <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
                <parameter key="condition_class" value="attribute_name_filter"/>
                <parameter key="attribute_name_regex" value="a1"/>
                <operator name="BinDiscretization" class="BinDiscretization">
                    <parameter key="number_of_bins" value="10"/>
                    <parameter key="range_name_type" value="short"/>
                </operator>
            </operator>
            <operator name="Aggregation" class="Aggregation">
                <list key="aggregation_attributes">
                  <parameter key="a1" value="count"/>
                </list>
                <parameter key="group_by_attributes" value="a1"/>
            </operator>
        </operator>
    </operator>
  • Options
    keyser84keyser84 Member Posts: 9 Contributor II
    Thank you... this would help.
Sign In or Register to comment.