Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"text mining (classification

mksaadmksaad Member Posts: 42 Maven
edited May 2019 in Help
Hello all,

I read many tutorials about text mining (TM) including tutorials about TM using RM.

most of these tutorials uses support vector machine (SVM) and Naive-Bayes (NB) as classification methods. I conclude they are the best Algorithm for text classification.
do you recommend me to use these algorithm or there are other suitable algorithms for text classification. (I am looking for Algorithms that implemented in RM)
If SVM and NB are the best one, any references about that will be appreciated.


I also appreciate any recommendation of RM clustering algorithms for text.


Thanks in advance,
--
Motaz K. Saad

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    I would suggest any clustering algorithm supporting the Cosine Similarity. And as always KMeans is worth a try.

    Greetings,
      Sebastian
  • gunjanamitgunjanamit Member Posts: 28 Contributor II
    Motaz,

    Have you done anything on Text Classification?

    I need help there...
  • mksaadmksaad Member Posts: 42 Maven
    Hello,

    You can take a look at http://sites.google.com/site/motazsite/publications

    you can find there conclusions on Arabic text classification and conclusions text classification in general.


    Regards,
    Motaz
  • jforrjforr Member Posts: 7 Contributor II
    Is there a good algorithm to use when my documents can have multiple categories assigned to them?  An example might be resumes where some are Java developers, some are SQL developers, and some are both Java and SQL developers?
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi, you can use Polynominal by Binominal Classification for this. This operator trains a model based on its inner process, where it tries to discriminate between each class and all other classes. During application the confidence for each class is calculated, and the one with the highest value is predicted. Please have a look at the attached process.

    Best, Marius
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.2.006">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
        <process expanded="true" height="494" width="752">
          <operator activated="true" class="generate_data" compatibility="5.2.006" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="target_function" value="three ring clusters"/>
            <parameter key="number_of_attributes" value="2"/>
          </operator>
          <operator activated="true" class="polynomial_by_binomial_classification" compatibility="5.2.006" expanded="true" height="76" name="Polynominal by Binominal Classification" width="90" x="246" y="30">
            <process expanded="true" height="512" width="770">
              <operator activated="true" class="naive_bayes" compatibility="5.2.006" expanded="true" height="76" name="Naive Bayes" width="90" x="313" y="30"/>
              <connect from_port="training set" to_op="Naive Bayes" to_port="training set"/>
              <connect from_op="Naive Bayes" from_port="model" to_port="model"/>
              <portSpacing port="source_training set" spacing="0"/>
              <portSpacing port="sink_model" spacing="0"/>
            </process>
          </operator>
          <operator activated="true" class="apply_model" compatibility="5.2.006" expanded="true" height="76" name="Apply Model" width="90" x="461" y="30">
            <list key="application_parameters"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Polynominal by Binominal Classification" to_port="training set"/>
          <connect from_op="Polynominal by Binominal Classification" from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_op="Polynominal by Binominal Classification" from_port="example set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
          <connect from_op="Apply Model" from_port="model" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
          <portSpacing port="sink_result 3" spacing="0"/>
        </process>
      </operator>
    </process>
  • jforrjforr Member Posts: 7 Contributor II
    Thanks, I'll try that.
Sign In or Register to comment.