Options

OutlierDistanceBasedDetection

alexmanalexman Member Posts: 9 Contributor II
edited November 2018 in Help
Hi,

I have an exampleSet with different attributes and I would like to apply Outlier in every numerical attribute but separately is this possible? which is the best way to do ?  I can apply outlier over the whole table but i would like to do it in every attribute, for example:

heigth weigth .....  // more attributtes
188      80
185      150
186      83
189      89
190      87
192      86
145      88

I would like to get 145 (heigth) and 150 (weigth) separately ... [Probably a process for each attribute applying DBoutlierOperator would be a solution but not efficient...]
DBOutlierOperator(OperatorDescription description)  is not applyable for an attribute of an exampleSet. Probably AttributeSelectionExampleSet which filters what attributes I want in exampleset would be useful but how to apply the Outlier function for each attribute?

thanks

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    you could use a combination of the feature iterator and the Attribute Subset Preprocessing, which will deliver only a subset of the exampleset's attributes to its child operators. Since this is a complex process, I will post a sample below:

    <operator name="Root" class="Process" expanded="yes">
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="sum classification"/>
        </operator>
        <operator name="IOStorer" class="IOStorer">
            <parameter key="name" value="Store"/>
            <parameter key="io_object" value="ExampleSet"/>
            <parameter key="remove_from_process" value="false"/>
        </operator>
        <operator name="FeatureIterator" class="FeatureIterator" expanded="yes">
            <parameter key="work_on_input" value="false"/>
            <operator name="IORetriever" class="IORetriever">
                <parameter key="name" value="Store"/>
                <parameter key="io_object" value="ExampleSet"/>
            </operator>
            <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
                <parameter key="condition_class" value="attribute_name_filter"/>
                <parameter key="parameter_string" value="att5"/>
                <parameter key="attribute_name_regex" value="%{loop_feature}"/>
                <operator name="DetectionOnSingleAttribute" class="DensityBasedOutlierDetection">
                    <parameter key="distance" value="1.0"/>
                    <parameter key="proportion" value="0.5"/>
                </operator>
                <operator name="DoingSomething" class="OperatorChain" expanded="yes">
                    <operator name="ChangeAttributeRole" class="ChangeAttributeRole">
                        <parameter key="name" value="Outlier"/>
                    </operator>
                    <operator name="ChangeAttributeName" class="ChangeAttributeName">
                        <parameter key="old_name" value="Outlier"/>
                        <parameter key="new_name" value="Outlier_%{loop_feature}"/>
                    </operator>
                </operator>
            </operator>
            <operator name="IOStorer (2)" class="IOStorer">
                <parameter key="name" value="Store"/>
                <parameter key="io_object" value="ExampleSet"/>
                <parameter key="remove_from_process" value="false"/>
            </operator>
        </operator>
        <operator name="IOConsumer" class="IOConsumer">
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
        <operator name="IORetriever (2)" class="IORetriever">
            <parameter key="name" value="Store"/>
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
    </operator>


    The problem is the behavior of the FeatureIterator, which will not deliver the changed exampleset after finishing. That's why we have to use the IOStore and IORetrieve operators to save the generated ExampleSet on our own. We actually only need the macro defined by the FeatureIterator giving us every regular attribute name, so that we can use it in the attributeSubsetPreprocessing condition.

    This sample only renames the attributes, but you very well might do something more intelligent like unification of the results of each attribute using an attributeConstruction, or something else.

    Greetings,
      Sebastian
  • Options
    alexmanalexman Member Posts: 9 Contributor II
    ok,

    now I have detected outliers but now I need to get the individual results.

    I have a table but i have to select atributes values where otliers is true.

    An sql statement would "select from table where outlier_vel=true" but what I have are the reults from the process (are in a exampleset) and I cannot make a query like sql...

    which is the best way to query in the exampleset results (applying filters)?

    thanks a lot!
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    did you try the ExampleFilter? It allows several conditions for filtering examples from the set.

    Greetings,
      Sebastian
  • Options
    cschiecschie Member Posts: 6 Contributor II
    Hi Sebastian,

    tackling a similiar problem. Tried the "Filter Examples" operator of Version 5.0.

    No matter how I set the parameter string (outlier=false / outlier=true), the result set is empty.

    <operator activated="true" class="filter_examples" expanded="true" height="76" name="Filter Examples" width="90" x="313" y="435">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="outlier=false"/>
          </operator>

    Maybe some syntax problem?

    Greetings,
    Chris
  • Options
    haddockhaddock Member Posts: 849 Maven
    G'Day Chris,

    LOF produces a number, rather than a boolean - r-Click on operator, then F1->Description produces ...
    Afterwards LOFs are added as values for a special real-valued outlier attribute in the example set which the operator will return
    .



  • Options
    cschiecschie Member Posts: 6 Contributor II
    G'day haddock,

    thanks for the hint. It works with LOF (filtering for "outlier < 1").

    So the question is: what is the correct syntax with boolean filter parameters?

    Cheers,
    Chris
  • Options
    cschiecschie Member Posts: 6 Contributor II
    Checked the example "processes->02_preprocessing_18_OutlierDetection".

    There is also a filter deployed filtering for "outlier=false". And it works.
    So I rebuilt my workflow one more time. And now it works. Cannot identiy any differences...

    Probably a case where the problem is in front of the screen one more time.

    Thanks for your help anyway!
Sign In or Register to comment.