"SVM Regression returning same values for all test records ?!?!"

noah977 · December 2008

I setup a nice SVM using nu-svr in RM.
As a test I trained it on a a sparse data set containing 1000 records.

Then, I tested it against a new data set of about 14 records.

Every record of the test set returned the exact same prediction. This seems highly unlikely since there are over 140 dimensions to the SVM and a significant amount of variation in the data.

One guess is that maybe I'm not loading in the sparse data correctly for testing.

I can't seem to discover where my error is. Maybe someone here can offer some help/suggestions.

Here is the training XML

<?xml version="1.0" encoding="MacRoman"?>
<process version="4.3">

  <operator name="Root" class="Process" expanded="yes">
      <operator name="SparseFormatExampleSource" class="SparseFormatExampleSource">
          <parameter key="data_file"	value="/Users/noah/train_sparse.txt"/>
          <parameter key="dimension"	value="140"/>
          <parameter key="format"	value="yx"/>
      </operator>
      <operator name="AttributeSubsetPreprocessing" class="AttributeSubsetPreprocessing" expanded="yes">
          <parameter key="attribute_name_regex"	value="label"/>
          <parameter key="condition_class"	value="is_nominal"/>
          <parameter key="process_special_attributes"	value="true"/>
          <operator name="NominalNumbers2Numerical" class="NominalNumbers2Numerical">
          </operator>
      </operator>
      <operator name="LibSVMLearner" class="LibSVMLearner">
          <parameter key="C"	value="100.0"/>
          <parameter key="gamma"	value="0.1"/>
          <parameter key="keep_example_set"	value="true"/>
          <parameter key="svm_type"	value="nu-SVR"/>
      </operator>
      <operator name="ModelWriter" class="ModelWriter">
          <parameter key="model_file"	value="/Users/noah/sparse_small.mod"/>
      </operator>
      <operator name="ModelApplier" class="ModelApplier">
          <list key="application_parameters">
          </list>
          <parameter key="create_view"	value="true"/>
          <parameter key="keep_model"	value="true"/>
      </operator>
      <operator name="RegressionPerformance" class="RegressionPerformance">
          <parameter key="absolute_error"	value="true"/>
          <parameter key="keep_example_set"	value="true"/>
          <parameter key="prediction_average"	value="true"/>
          <parameter key="relative_error"	value="true"/>
          <parameter key="relative_error_lenient"	value="true"/>
          <parameter key="root_mean_squared_error"	value="true"/>
      </operator>
  </operator>

</process>

Here are 2 rows to training data


0.99307958477511 1:2 2:12 3:0.982609455619486 4:0 5:14 6:5 7:0.8 8:0.0348258706467662 9:201 10:0.0496977837474815 11:1489 1
2:1 13:1 14:0.00477630731561417 15:133 16:10.81 17:5.5 101:1 116:1 117:1 119:1 125:1\
0.989655172413817 1:3 2:2 3:0.973641810178274 4:0 5:63 6:3 7:1 8:0.0631443298969072 9:776 10:0.0769704433497537 11:1624 12:
1 13:0.5 14:0.0049596226732805 15:123 16:-0.09 17:6 101:1 116:1 117:1 119:1 125:1

here is the test XML

<?xml version="1.0" encoding="MacRoman"?>
<process version="4.3">

  <operator name="Root" class="Process" expanded="yes">
      <operator name="SparseFormatExampleSource" class="SparseFormatExampleSource">
          <parameter key="data_file"	value="/Users/noah/test.txt"/>
          <parameter key="dimension"	value="141"/>
          <parameter key="format"	value="yx"/>
      </operator>
      <operator name="ModelLoader" class="ModelLoader">
          <parameter key="model_file"	value="/Users/noah/sparse_c4_1000.mod"/>
      </operator>
      <operator name="ModelApplier" class="ModelApplier">
          <list key="application_parameters">
          </list>
          <parameter key="create_view"	value="true"/>
          <parameter key="keep_model"	value="true"/>
      </operator>
  </operator>

</process>

here are 2 rows of test data

1:0 2:14 3:0.979392741314451 4:0.0909090909090909 5:28 6:22 7:0.227272727272727 8:0.0436046511627907 9:1376 10:0.0735090152
565881 11:1442 12:0 13:2 14:0.0104266852405951 15:133 16:9.64 17:8.09 103:1 116:1 117:1 119:1 125:1
1:0 2:1 3:0.980626115895827 4:0.0357142857142857 5:20 6:28 7:0.178571428571429 8:0.0338541666666667 9:768 10:0.065300896286
8118 11:781 12:0.321428571428571 13:0.2 14:0.0067155135256289 15:130 16:6.64 17:8.32 102:1 111:1 117:1 119:1 125:1

land · December 2008

Hi,
this process does not contain any obvious errors. (To cite my favorite error message)
Perhabs you only need to tune the SVM Parameters?
As a second hint: It is much more comfortable to use the build in validation operators instead of splitting the data manually and use two processes. You could use the XValidation, which is explained in the 04_Validation/03_XValidation_Numerical.xml sample in the sample directory.
To tune your SVM Parameters you could take a look at the 07_Meta/01_ParameterOptimization Sample.

Greetings,
Sebastian

noah977 · December 2008

Sebastian,

I HAVE performed the parameter optimization and XV validation to build a good model.

What you are seeing in my earlier post is using the model on "real-world" data. This was an actual application of the SVM to learn something about unlabeled data.

My concern is this: If the XV during the training showed decent results, why would the SVM predict the exact same output for the REAL data?? It is possible but very highly unlikely.

-N

land · December 2008

Hi,
this is strange indeed. The attribute header is exactly the same as in the trainingsset?
Without the data I can't image any other possible error, since I cannot reproduce the behavior.

Greetings,
Sebastian

noah977 · December 2008

Sebastian,

I think I found the problem. My data is a 2 class problem.
14% is class 1
86% is class 0

From what I've recently read, having "unbalanced" training sets can cause the SVM to develop a model that heavily favors the larger class. This would explain the results I've been seeing.

My question is: Is there a way to have RapidMiner weight the classes or account for the unbalanced training data?

Thanks,

-N

land · December 2008

Hi,
there is an operator called EqualLabelWeighting which will distribute an equal total weight on all classes. Hence exampes of a dominating class will be down weighted.
But you then will need a learner capable of using example weights. You should check this in the operators info of the learning operator.

Greetings,
Sebastian

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"SVM Regression returning same values for all test records ?!?!"

Answers