Options

imbalanced data

m_r_nourm_r_nour Member Posts: 35 Maven
edited November 2018 in Help

hi


how can I solve imbalanced problem?

I used adaboostm1 weka, but it doesn't work at all,
1. I used sampling method to balance data and performance developed but as far as I know it should be by far better methods to solve imbalanced problem.
2. Moreover, libsvm can be used in weighted mode, but I do not know how use it and tune libsvm parameters like cost weight ,....
and how it can be used in metacost?




I'd appreciate if you help me





Regards
REZA

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Reza,
    just put the LibSVM learner inside the MetaCost learner. The metaCost learner is a so called Meta Learner using another, inner learning scheme.
    This time you could have simply read the manual. This is why haddock repeats it so many times. And although this IS a help forum, other people have to spend their time for giving you hints. So I think it's fair that you made your best efforts to cope with the problem yourself. And this always should include the (admittedly spare) documentation.

    Greetings,
      Sebastian
  • Options
    m_r_nourm_r_nour Member Posts: 35 Maven
    Hi

    thanks Sebastian



    my code is :
    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="data_ver5_all.csv"/>
            <parameter key="label_name" value="CellCycle"/>
        </operator>
        <operator name="ExampleFilter" class="ExampleFilter" breakpoints="after">
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="label=anaA||label=anaB||label=prometa"/>
        </operator>
        <operator name="Normalization" class="Normalization">
        </operator>
        <operator name="Random Optimizer" class="RandomOptimizer" expanded="yes">
            <parameter key="iterations" value="100"/>
            <operator name="Xvalidation" class="XValidation" expanded="yes">
                <parameter key="keep_example_set" value="true"/>
                <parameter key="create_complete_model" value="true"/>
                <operator name="MetaCost" class="MetaCost" expanded="yes">
                    <parameter key="keep_example_set" value="true"/>
                    <parameter key="cost_matrix" value="[0.0 3.0 1.0;1.0 0.0 1.0;1.0 1.0 0.0]"/>
                    <operator name="LibSVMLearner" class="LibSVMLearner">
                        <list key="class_weights">
                        </list>
                    </operator>
                </operator>
                <operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
                    <operator name="ModelApplier (2)" class="ModelApplier">
                        <parameter key="keep_model" value="true"/>
                        <list key="application_parameters">
                        </list>
                        <parameter key="create_view" value="true"/>
                    </operator>
                    <operator name="Performance (2)" class="Performance">
                        <parameter key="keep_example_set" value="true"/>
                    </operator>
                </operator>
            </operator>
            <operator name="AverageBuilder" class="AverageBuilder">
            </operator>
            <operator name="Performance (3)" class="Performance">
                <parameter key="keep_example_set" value="true"/>
            </operator>
            <operator name="ProcessLog" class="ProcessLog">
                <parameter key="filename" value="output_%{a}.log"/>
                <list key="log">
                  <parameter key="File" value="operator.CSVExampleSource.parameter.filename"/>
                  <parameter key="Learner" value="operator.Classifier.parameter.select_which"/>
                  <parameter key="Performance" value="operator.Xvalidation.value.performance"/>
                  <parameter key="Deviation" value="operator.Xvalidation.value.deviation"/>
                  <parameter key="Feature Selection On|Off" value="operator.FS SAM.parameter.enable"/>
                  <parameter key="Pro_meta merging" value="operator.Class merging.parameter.enable"/>
                </list>
                <parameter key="sorting_dimension" value="3"/>
            </operator>
        </operator>
    </operator>

    but it halts by "process failed " massage and I donot know why

    I'd appreciate if help me about this matter

    Regards
    REZA
  • Options
    haddockhaddock Member Posts: 849 Maven
    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource" activated="no">
            <parameter key="filename" value="data_ver5_all.csv"/>
            <parameter key="label_name" value="CellCycle"/>
        </operator>
        <operator name="ExampleFilter" class="ExampleFilter" breakpoints="after" activated="no">
            <description text="This will not work - we've alreeady discussed why - wake up"/>
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="label=anaA||label=anaB||label=prometa"/>
        </operator>
        <operator name="ExampleSetGenerator" class="ExampleSetGenerator">
            <parameter key="target_function" value="simple non linear classification"/>
        </operator>
        <operator name="Normalization" class="Normalization">
        </operator>
        <operator name="Random Optimizer" class="RandomOptimizer" expanded="yes">
            <parameter key="iterations" value="10"/>
            <operator name="Xvalidation" class="XValidation" expanded="yes">
                <parameter key="keep_example_set" value="true"/>
                <parameter key="create_complete_model" value="true"/>
                <operator name="MetaCost" class="MetaCost" expanded="yes">
                    <parameter key="keep_example_set" value="true"/>
                    <parameter key="cost_matrix" value="[0.0 3.0 1.0;1.0 0.0 1.0;1.0 1.0 0.0]"/>
                    <operator name="LibSVMLearner" class="LibSVMLearner">
                        <list key="class_weights">
                        </list>
                    </operator>
                </operator>
                <operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
                    <operator name="ModelApplier (2)" class="ModelApplier">
                        <parameter key="keep_model" value="true"/>
                        <list key="application_parameters">
                        </list>
                        <parameter key="create_view" value="true"/>
                    </operator>
                    <operator name="Performance (2)" class="Performance">
                        <parameter key="keep_example_set" value="true"/>
                    </operator>
                </operator>
            </operator>
            <operator name="AverageBuilder" class="AverageBuilder" activated="no">
            </operator>
            <operator name="Performance (3)" class="Performance" activated="no">
                <description text="Why on earth is this here? Disable it."/>
                <parameter key="keep_example_set" value="true"/>
            </operator>
            <operator name="ProcessLog" class="ProcessLog">
                <parameter key="filename" value="output_%{a}.log"/>
                <list key="log">
                  <parameter key="File" value="operator.CSVExampleSource.parameter.filename"/>
                  <parameter key="Performance" value="operator.Xvalidation.value.performance"/>
                  <parameter key="Deviation" value="operator.Xvalidation.value.deviation"/>
                </list>
                <parameter key="sorting_dimension" value="3"/>
            </operator>
        </operator>
    </operator>
  • Options
    m_r_nourm_r_nour Member Posts: 35 Maven
    hi

    but it doesn't work, it seems you just disabled averagebuilder  and performance of it, I do it in my data, but again process failed message . however thanks for your time and consideration
    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource">
            <parameter key="filename" value="I:\WORK\ver5\RM\DATA\data_ver5_all.csv"/>
            <parameter key="label_name" value="CellCycle"/>
        </operator>
        <operator name="ExampleFilter" class="ExampleFilter">
            <description text="This will not work - we've alreeady discussed why - wake up"/>
            <parameter key="condition_class" value="attribute_value_filter"/>
            <parameter key="parameter_string" value="label=anaA||label=anaB||label=prometa"/>
        </operator>
        <operator name="Normalization" class="Normalization">
        </operator>
        <operator name="Random Optimizer" class="RandomOptimizer" expanded="yes">
            <parameter key="iterations" value="10"/>
            <operator name="Xvalidation" class="XValidation" expanded="yes">
                <parameter key="keep_example_set" value="true"/>
                <parameter key="create_complete_model" value="true"/>
                <operator name="MetaCost" class="MetaCost" expanded="yes">
                    <parameter key="keep_example_set" value="true"/>
                    <parameter key="cost_matrix" value="[0.0 3.0 1.0;1.0 0.0 1.0;1.0 1.0 0.0]"/>
                    <operator name="LibSVMLearner" class="LibSVMLearner">
                        <list key="class_weights">
                        </list>
                    </operator>
                </operator>
                <operator name="OperatorChain (2)" class="OperatorChain" expanded="yes">
                    <operator name="ModelApplier (2)" class="ModelApplier">
                        <parameter key="keep_model" value="true"/>
                        <list key="application_parameters">
                        </list>
                        <parameter key="create_view" value="true"/>
                    </operator>
                    <operator name="Performance (2)" class="Performance">
                        <parameter key="keep_example_set" value="true"/>
                    </operator>
                </operator>
            </operator>
            <operator name="ProcessLog" class="ProcessLog">
                <parameter key="filename" value="output_%{a}.log"/>
                <list key="log">
                  <parameter key="File" value="operator.CSVExampleSource.parameter.filename"/>
                  <parameter key="Performance" value="operator.Xvalidation.value.performance"/>
                  <parameter key="Deviation" value="operator.Xvalidation.value.deviation"/>
                </list>
                <parameter key="sorting_dimension" value="3"/>
            </operator>
        </operator>
    </operator>
  • Options
    haddockhaddock Member Posts: 849 Maven
    It really helps if you include the error message. This code works on generated examples on my machine right now. I disabled the average builder and final performance operator because that was causing the failure on the original code, as detailed in your original post.

  • Options
    m_r_nourm_r_nour Member Posts: 35 Maven
    G Nov 28, 2009 5:50:44 PM: [Fatal] ArrayIndexOutOfBoundsException occured in 1st application of ModelApplier (2) (ModelApplier)
    G Nov 28, 2009 5:50:44 PM: [Fatal] Process failed: operator cannot be executed (3). Check the log messages...
              Root[1] (Process)
              +- CSVExampleSource[1] (CSVExampleSource)
              +- ExampleFilter[1] (ExampleFilter)
              +- Normalization[1] (Normalization)
              +- Random Optimizer[1] (RandomOptimizer)
                +- Xvalidation[1] (XValidation)
                |  +- MetaCost[1] (MetaCost)
                |  |  +- LibSVMLearner[10] (LibSVMLearner)
                |  +- OperatorChain (2)[1] (OperatorChain)
    here ==>   |    +- ModelApplier (2)[1] (ModelApplier)
                |    +- Performance (2)[0] (Performance)
                +- ProcessLog[0] (ProcessLog)
  • Options
    haddockhaddock Member Posts: 849 Maven
    Must be the data then, as the generator version works. My guess is that your filter doesn't work as you think, because you use '||'  where you should use '|'.
  • Options
    m_r_nourm_r_nour Member Posts: 35 Maven
    Hi Haddock


    no, even I changed the code to as you think, but again same problem

    but:

    number of classes are 7 and by filter I reduced them to 3, but classifier apply to data as if it has 7 group because when I changed size of matrix 7
    it works.?

    so.... how can ....?


    thanks for your time

    regards
    REZA
Sign In or Register to comment.