"text mining (classification

mksaad · August 2009

Hello all,

I read many tutorials about text mining (TM) including tutorials about TM using RM.

most of these tutorials uses support vector machine (SVM) and Naive-Bayes (NB) as classification methods. I conclude they are the best Algorithm for text classification.
do you recommend me to use these algorithm or there are other suitable algorithms for text classification. (I am looking for Algorithms that implemented in RM)
If SVM and NB are the best one, any references about that will be appreciated.

I also appreciate any recommendation of RM clustering algorithms for text.

Thanks in advance,
--
Motaz K. Saad

land · August 2009

Hi,
I would suggest any clustering algorithm supporting the Cosine Similarity. And as always KMeans is worth a try.

Greetings,
Sebastian

gunjanamit · June 2012

Motaz,

Have you done anything on Text Classification?

I need help there...

mksaad · June 2012

Hello,

You can take a look at http://sites.google.com/site/motazsite/publications

you can find there conclusions on Arabic text classification and conclusions text classification in general.

Regards,
Motaz

jforr · July 2012

Is there a good algorithm to use when my documents can have multiple categories assigned to them? An example might be resumes where some are Java developers, some are SQL developers, and some are both Java and SQL developers?

MariusHelf · July 2012

Hi, you can use Polynominal by Binominal Classification for this. This operator trains a model based on its inner process, where it tries to discriminate between each class and all other classes. During application the confidence for each class is calculated, and the one with the highest value is predicted. Please have a look at the attached process.

Best, Marius

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.2.006">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="5.2.006" expanded="true" name="Process">
    <process expanded="true" height="494" width="752">
      <operator activated="true" class="generate_data" compatibility="5.2.006" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
        <parameter key="target_function" value="three ring clusters"/>
        <parameter key="number_of_attributes" value="2"/>
      </operator>
      <operator activated="true" class="polynomial_by_binomial_classification" compatibility="5.2.006" expanded="true" height="76" name="Polynominal by Binominal Classification" width="90" x="246" y="30">
        <process expanded="true" height="512" width="770">
          <operator activated="true" class="naive_bayes" compatibility="5.2.006" expanded="true" height="76" name="Naive Bayes" width="90" x="313" y="30"/>
          <connect from_port="training set" to_op="Naive Bayes" to_port="training set"/>
          <connect from_op="Naive Bayes" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="apply_model" compatibility="5.2.006" expanded="true" height="76" name="Apply Model" width="90" x="461" y="30">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Generate Data" from_port="output" to_op="Polynominal by Binominal Classification" to_port="training set"/>
      <connect from_op="Polynominal by Binominal Classification" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Polynominal by Binominal Classification" from_port="example set" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
      <connect from_op="Apply Model" from_port="model" to_port="result 1"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>

jforr · July 2012

Thanks, I'll try that.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"text mining (classification

Answers