"Text Classification with two languages in one model"

AnjaF · September 2009

Hi,

I hope that my question is a newbie question.
I want to make a text classification tool for two languages. For that I want to create two Textinput-Elements with two different Stemmer-Elemets (one for each language). Is it possible to load both textinput-elements with the same two labels in one LibSVMLeaner?

Thanks for any help,
Anja

land · September 2009

Hi Anja,
unfortunately I'm not quite sure what you are aiming at. The TextInput operators produce an exampleset containing the informations about each processed text as a single example. The Learning operators (and hence the LibSVM, too) use an exampleset for learning.

If your question is, if the text may occur twice in the example set with the same label and probably slightly changed word values: Yes this works. BUT: The performance will be probably not as good as training two SVM, one for each used stemming.

Greetings,
Sebastian

AnjaF · September 2009

Thanks a lot Sebastian,

the question was if you could train one SVM with two examplesets. I decided to train two SVMs, but it's working.
Attached the XML for those who are interested.


<operator name="Root" class="Process" expanded="yes">
    <description text="#ylt#p#ygt# Transformations of the attribute space may ease learning in a way, that simple learning schemes may be able to learn complex functions. This is the basic idea of the kernel trick. But even without kernel based learning schemes the transformation of feature space may be necessary to reach good learning results. #ylt#/p#ygt#  #ylt#p#ygt# RapidMiner offers several different feature selection, construction, and extraction methods. This selection process (the well known forward selection) uses an inner cross validation for performance estimation. This building block serves as fitness evaluation for all candidate feature sets. Since the performance of a certain learning scheme is taken into account we refer to processes of this type as #yquot#wrapper approaches#yquot#.#ylt#/p#ygt#  #ylt#p#ygt#Additionally the process log operator plots intermediate results. You can inspect them online in the Results tab. Please refer to the visualization sample processes or the RapidMiner tutorial for further details.#ylt#/p#ygt#  #ylt#p#ygt# Try the following: #ylt#ul#ygt# #ylt#li#ygt#Start the process and change to #yquot#Result#yquot# view. There can be a plot selected. Plot the #yquot#performance#yquot# against the #yquot#generation#yquot# of the feature selection operator.#ylt#/li#ygt# #ylt#li#ygt#Select the feature selection operator in the tree view. Change the search directory from forward (forward selection) to backward (backward elimination). Restart the process. All features will be selected.#ylt#/li#ygt# #ylt#li#ygt#Select the feature selection operator. Right click to open the context menu and repace the operator by another feature selection scheme (for example a genetic algorithm).#ylt#/li#ygt# #ylt#li#ygt#Have a look at the list of the process log operator. Every time it is applied it collects the specified data. Please refer to the RapidMiner Tutorial for further explanations. After changing the feature selection operator to the genetic algorithm approach, you have to specify the correct values. #ylt#table#ygt##ylt#tr#ygt##ylt#td#ygt##ylt#icon#ygt#groups/24/visualization#ylt#/icon#ygt##ylt#/td#ygt##ylt#td#ygt##ylt#i#ygt#Use the process log operator to log values online.#ylt#/i#ygt##ylt#/td#ygt##ylt#/tr#ygt##ylt#/table#ygt# #ylt#/li#ygt# #ylt#/ul#ygt# #ylt#/p#ygt#"/>
    <operator name="TextInput" class="TextInput" expanded="yes">
        <list key="texts">
          <parameter key="pos"	value="..path..to..\eng\pos"/>
          <parameter key="neg"	value="..path..to..\eng\neg"/>
        </list>
        <parameter key="id_attribute_type"	value="short"/>
        <list key="namespaces">
        </list>
        <operator name="StringTokenizer" class="StringTokenizer">
        </operator>
        <operator name="ToLowerCaseConverter" class="ToLowerCaseConverter">
        </operator>
        <operator name="EnglishStopwordFilter" class="EnglishStopwordFilter">
        </operator>
        <operator name="PorterStemmer" class="PorterStemmer">
        </operator>
    </operator>
    <operator name="TextInput (2)" class="TextInput" expanded="yes">
        <list key="texts">
          <parameter key="pos"	value="..path..to..\ger\pos"/>
          <parameter key="neg"	value="..path..to..\ger\neg"/>
        </list>
        <parameter key="id_attribute_type"	value="short"/>
        <list key="namespaces">
        </list>
        <operator name="StringTokenizer (2)" class="StringTokenizer">
        </operator>
        <operator name="ToLowerCaseConverter (2)" class="ToLowerCaseConverter">
        </operator>
        <operator name="GermanStopwordFilter" class="GermanStopwordFilter">
        </operator>
        <operator name="GermanStemmer" class="GermanStemmer">
        </operator>
    </operator>
    <operator name="XValidation" class="XValidation" expanded="no">
        <parameter key="create_complete_model"	value="true"/>
        <parameter key="number_of_validations"	value="2"/>
        <operator name="LibSVMLearner" class="LibSVMLearner">
            <parameter key="kernel_type"	value="linear"/>
            <list key="class_weights">
            </list>
            <parameter key="confidence_for_multiclass"	value="false"/>
        </operator>
        <operator name="OperatorChain" class="OperatorChain" expanded="no">
            <operator name="ModelApplier" class="ModelApplier">
                <list key="application_parameters">
                </list>
            </operator>
            <operator name="PerformanceEvaluator" class="PerformanceEvaluator">
                <parameter key="keep_example_set"	value="true"/>
                <parameter key="relative_error"	value="true"/>
                <list key="class_weights">
                </list>
            </operator>
        </operator>
    </operator>
</operator>

Best regards,
Anja

land · September 2009

Hi Anja,
yes it works, but it only trains on one ExampleSet. You would have to merge the two sets, in order to train on all data. Use the ExampleSetMerge operator for this.

Greetings,
Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Text Classification with two languages in one model"

Answers