"how to decrease model size - delete where weight

emolanoemolano Member Posts: 13 Contributor II
edited May 2019 in Help
Hi there.. me again :)
I have a process to create a textmining model. My model is too big so I want it to use data where weight>0... on the weight table I see lots of words with weight=0 that I want to delete - not include in the model. Is there a way to do this?
thanks again for your help!
here my code
 
<operator name="Root" class="Process" expanded="yes">
    <description text="#ylt#h3#ygt#text Data Mining#ylt#/h3#ygt##ylt#p#ygt##ylt#/p#ygt#"/>
    <operator name="DatabaseExampleSource" class="DatabaseExampleSource">
        <parameter key="database_url" value="jdbc:mysql://bi01:3306/database"/>
        <parameter key="username" value="user"/>
        <parameter key="password" value="pwd"/>
        <parameter key="query" value="SELECT `ID_NUM`, `SHORT_DESC`, `PLATFORM` FROM `TABLEX`;"/>
        <parameter key="label_attribute" value="PLATFORM"/>
        <parameter key="id_attribute" value="ID_NUM"/>
    </operator>
    <operator name="StringTextInput" class="StringTextInput" expanded="yes">
        <parameter key="filter_nominal_attributes" value="true"/>
        <parameter key="remove_original_attributes" value="true"/>
        <parameter key="default_content_language" value="english"/>
        <parameter key="output_word_list" value="crmtraining_words.list"/>
        <list key="namespaces">
        </list>
        <operator name="StringTokenizer" class="StringTokenizer">
        </operator>
        <operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
        </operator>
        <operator name="TokenLengthFilter" class="TokenLengthFilter">
            <parameter key="min_chars" value="2"/>
        </operator>
        <operator name="ToLowerCaseConverter" class="ToLowerCaseConverter">
        </operator>
        <operator name="PorterStemmer" class="PorterStemmer">
        </operator>
        <operator name="StopwordFilterFile" class="StopwordFilterFile">
            <parameter key="file" value="stop_filter_platform.txt"/>
        </operator>
        <operator name="TermNGramGenerator" class="TermNGramGenerator">
            <parameter key="max_length" value="3"/>
        </operator>
    </operator>
    <operator name="LibSVMLearner" class="LibSVMLearner">
        <parameter key="kernel_type" value="linear"/>
        <list key="class_weights">
        </list>
    </operator>
    <operator name="ModelWriter" class="ModelWriter">
        <parameter key="model_file" value="model.mod"/>
    </operator>
</operator>
Tagged:

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    you could use a weighting scheme before applying the learner, this would reduce the number of attributes and hence the length of support vectors. A similar weighting to the svm's weight vectors will be given by the SVMWeighting operator.  If you need to apply the weights lateron, you could use the attributeWeightsApplier.

    Greetings,
      Sebastian
Sign In or Register to comment.