RapidMiner

Machine Learning, Prediction-SVM,Data-Mining

Regular Contributor

Machine Learning, Prediction-SVM,Data-Mining

[ Edited ]

Hi everyone,

So, I upload the process so that you can see!

I used two datasets, one(Opn_Score) is for building the SVM model which contains 9605 examples from 250 Twitter users and contains two columns- Text(the tweet), Score(the psychological score of the tweet) and to build the model  I used score as a level. And I also used split data .7/.3 so that it's not overfitted.

And finally I fetched celebrity data(tweet) from twitter as a testing set on top of that model to predict their personality! How open or how shy they on the scale of (1,5) but I get the average test result of Gal Gadot 4.229 but then when I fetched some more celebrity its give me still similar type of result like Heidi Clum- 4.207,Hillary Clinton- 4.229, Donald Trump- 4.206, Leonardo DiCaprio- 4.209! And as you can see all of them is more or less very close to each other so I dont understand what exactly I did wrong!

I need correct it asap so it would be great if any of you reply soon! Please check the png file for better explanation, thanks in advance! 

Regards,

Arnab

Attachments

20 REPLIES
Moderator

Re: Machine Learning, Prediction-SVM,Data-Mining

Did you measure the performance of your model? What does the performance say?

Regular Contributor

Re: Machine Learning, Prediction-SVM,Data-Mining

I evaluated the performance of the model now and its saying RMSE is 0.594!

 

Moderator

Re: Machine Learning, Prediction-SVM,Data-Mining

What is the range of your labels?  Like 0 to 5?

 

Your RSME may indicate a poorly fit model. 

Regular Contributor

Re: Machine Learning, Prediction-SVM,Data-Mining

Range is 1 to 5 and according to some paper if the  RMSE for this specific data is less than .88 than its good!

Paper ref: https://www.cl.cam.ac.uk/~dq209/publications/quercia11twitter.pdf

Moderator

Re: Machine Learning, Prediction-SVM,Data-Mining

[ Edited ]

Try measuring the performance using Cross Validation. I suspect how your split the data will affect your performance and results. CV is the best way to go IMHO.  I noticed that the analysis was done using CV and M5 rules Decision Tree algo. So your SVM (Linear) might be better or worse. 

Regular Contributor

Re: Machine Learning, Prediction-SVM,Data-Mining

I uploaded the score also, you can see!

Thank you for your help!

Regular Contributor

Re: Machine Learning, Prediction-SVM,Data-Mining

You meant to say Cross Validation for building the model? And then which algorithm I should use under this you think?

And the problem is when I build the model from CV and give any training set it always says example set doesnt match with the training set!

Moderator

Re: Machine Learning, Prediction-SVM,Data-Mining

Try a setup like this. 

 

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.000">
  <context>
    <input/>
    <output/>
    <macros/>
  </context>
  <operator activated="true" class="process" compatibility="7.6.000" expanded="true" name="Process">
    <process expanded="true">
      <operator activated="true" class="social_media:search_twitter" compatibility="7.3.000" expanded="true" height="68" name="Search Twitter" width="90" x="45" y="34">
        <parameter key="connection" value="Twitter - Studio Connection"/>
        <parameter key="query" value="machinelearning"/>
        <parameter key="limit" value="1000"/>
      </operator>
      <operator activated="true" class="generate_attributes" compatibility="7.6.000" expanded="true" height="82" name="Generate Attributes" width="90" x="179" y="34">
        <list key="function_descriptions">
          <parameter key="label" value="if([Retweet-Count]&gt;5,1,0)"/>
        </list>
      </operator>
      <operator activated="true" class="set_role" compatibility="7.6.000" expanded="true" height="82" name="Set Role" width="90" x="313" y="34">
        <parameter key="attribute_name" value="label"/>
        <parameter key="target_role" value="label"/>
        <list key="set_additional_roles"/>
      </operator>
      <operator activated="true" class="select_attributes" compatibility="7.6.000" expanded="true" height="82" name="Select Attributes" width="90" x="447" y="34">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Text"/>
      </operator>
      <operator activated="true" class="nominal_to_text" compatibility="7.6.000" expanded="true" height="82" name="Nominal to Text" width="90" x="581" y="34">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="Text"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="7.6.000" expanded="true" height="103" name="Multiply" width="90" x="581" y="238"/>
      <operator activated="true" class="select_attributes" compatibility="7.6.000" expanded="true" height="82" name="Select Attributes (2)" width="90" x="715" y="289">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="label"/>
        <parameter key="invert_selection" value="true"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data" width="90" x="849" y="34">
        <parameter key="prune_method" value="percentual"/>
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize" width="90" x="246" y="34"/>
          <connect from_port="document" to_op="Tokenize" to_port="document"/>
          <connect from_op="Tokenize" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="text:process_document_from_data" compatibility="7.5.000" expanded="true" height="82" name="Process Documents from Data (2)" width="90" x="849" y="289">
        <parameter key="prune_method" value="percentual"/>
        <list key="specify_weights"/>
        <process expanded="true">
          <operator activated="true" class="text:tokenize" compatibility="7.5.000" expanded="true" height="68" name="Tokenize (2)" width="90" x="246" y="34"/>
          <connect from_port="document" to_op="Tokenize (2)" to_port="document"/>
          <connect from_op="Tokenize (2)" from_port="document" to_port="document 1"/>
          <portSpacing port="source_document" spacing="0"/>
          <portSpacing port="sink_document 1" spacing="0"/>
          <portSpacing port="sink_document 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="concurrency:cross_validation" compatibility="7.6.000" expanded="true" height="145" name="Validation" width="90" x="983" y="34">
        <parameter key="sampling_type" value="shuffled sampling"/>
        <process expanded="true">
          <operator activated="true" class="weka:W-M5Rules" compatibility="7.3.000" expanded="true" height="82" name="W-M5Rules" width="90" x="253" y="34"/>
          <connect from_port="training set" to_op="W-M5Rules" to_port="training set"/>
          <connect from_op="W-M5Rules" from_port="model" to_port="model"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
          <description align="left" color="green" colored="true" height="113" resized="true" width="284" x="33" y="148">Builds a model on the current training data set (90 % of the data by default, 10 times).&lt;br&gt;&lt;br&gt;Make sure that you only put numerical attributes into a linear regression!</description>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="7.6.000" expanded="true" height="82" name="Apply Model" width="90" x="45" y="34">
            <list key="application_parameters"/>
          </operator>
          <operator activated="true" class="performance" compatibility="7.6.000" expanded="true" height="82" name="Performance" width="90" x="179" y="34"/>
          <connect from_port="model" to_op="Apply Model" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
          <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
          <connect from_op="Performance" from_port="example set" to_port="test set results"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="sink_test set results" spacing="0"/>
          <portSpacing port="sink_performance 1" spacing="0"/>
          <portSpacing port="sink_performance 2" spacing="0"/>
          <description align="left" color="blue" colored="true" height="107" resized="true" width="333" x="28" y="139">Applies the model built from the training data set on the current test set (10 % by default).&lt;br/&gt;The Performance operator calculates performance indicators and sends them to the operator result.</description>
        </process>
        <description align="center" color="transparent" colored="false" width="126">A cross validation including a linear regression.</description>
      </operator>
      <operator activated="true" class="apply_model" compatibility="7.6.000" expanded="true" height="82" name="Apply Model (2)" width="90" x="1184" y="238">
        <list key="application_parameters"/>
      </operator>
      <connect from_op="Search Twitter" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
      <connect from_op="Generate Attributes" from_port="example set output" to_op="Set Role" to_port="example set input"/>
      <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
      <connect from_op="Select Attributes" from_port="example set output" to_op="Nominal to Text" to_port="example set input"/>
      <connect from_op="Nominal to Text" from_port="example set output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Process Documents from Data" to_port="example set"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Select Attributes (2)" to_port="example set input"/>
      <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Process Documents from Data (2)" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="example set" to_op="Validation" to_port="example set"/>
      <connect from_op="Process Documents from Data" from_port="word list" to_op="Process Documents from Data (2)" to_port="word list"/>
      <connect from_op="Process Documents from Data (2)" from_port="example set" to_op="Apply Model (2)" to_port="unlabelled data"/>
      <connect from_op="Validation" from_port="model" to_op="Apply Model (2)" to_port="model"/>
      <connect from_op="Validation" from_port="performance 1" to_port="result 1"/>
      <connect from_op="Apply Model (2)" from_port="labelled data" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="0"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
    </process>
  </operator>
</process>
Regular Contributor

Re: Machine Learning, Prediction-SVM,Data-Mining

Dear Thomas,

I tried to use this setup but I really didnt get it how exactly to use it for my problem!

I am sorry for asking too much I really stuck in this problem!