"Not getting confidence values from LibSVM [SOLVED]"

javartjavart Member Posts: 2 Contributor I
edited June 2019 in Help
I'm running a multi-class classification experiment using LibSVM.
When I check the classification output from the trained model, I see predicted labels, but all the confidence values are equal to zero.
I have checked the parameter "calculate confidences" in the LibSVM operator. Am I missing something?
Below there's my XML for the process as well as a few lines from my input data.

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<process version="5.3.012">
 <context>
   <input/>
   <output/>
   <macros/>
 </context>
 <operator activated="true" class="process" compatibility="5.3.012" expanded="true" name="Process">
   <parameter key="logverbosity" value="status"/>
   <parameter key="logfile" value="log"/>
   <parameter key="resultfile" value="result"/>
   <process expanded="true">
     <operator activated="true" class="read_sparse" compatibility="5.3.012" expanded="true" height="60" name="Read Sparse" width="90" x="112" y="120">
       <parameter key="format" value="yx"/>
       <parameter key="data_file" value="/home/javier/workspace/Taxonomy Integration/data/machineLearning/100012.dat.3"/>
       <parameter key="dimension" value="70000"/>
       <parameter key="datamanagement" value="int_sparse_array"/>
       <list key="prefix_map"/>
     </operator>
     <operator activated="true" class="split_validation" compatibility="5.3.012" expanded="true" height="112" name="Validation" width="90" x="313" y="120">
       <parameter key="split_ratio" value="0.8"/>
       <parameter key="training_set_size" value="1000"/>
       <parameter key="test_set_size" value="1000"/>
       <parameter key="sampling_type" value="stratified sampling"/>
       <parameter key="use_local_random_seed" value="true"/>
       <process expanded="true">
         <operator activated="true" class="support_vector_machine_libsvm" compatibility="5.3.012" expanded="true" height="76" name="SVM" width="90" x="112" y="30">
           <parameter key="kernel_type" value="linear"/>
           <list key="class_weights"/>
           <parameter key="calculate_confidences" value="true"/>
         </operator>
         <connect from_port="training" to_op="SVM" to_port="training set"/>
         <connect from_op="SVM" from_port="model" to_port="model"/>
         <portSpacing port="source_training" spacing="0"/>
         <portSpacing port="sink_model" spacing="0"/>
         <portSpacing port="sink_through 1" spacing="0"/>
       </process>
       <process expanded="true">
         <operator activated="true" class="apply_model" compatibility="5.3.012" expanded="true" height="76" name="Apply Model" width="90" x="112" y="30">
           <list key="application_parameters"/>
         </operator>
         <operator activated="true" class="write_model" compatibility="5.3.012" expanded="true" height="60" name="Write Model" width="90" x="246" y="165">
           <parameter key="model_file" value="model.mod"/>
           <parameter key="output_type" value="Binary"/>
         </operator>
         <operator activated="true" breakpoints="after" class="performance_classification" compatibility="5.3.012" expanded="true" height="76" name="Performance" width="90" x="246" y="30">
           <list key="class_weights"/>
         </operator>
         <connect from_port="model" to_op="Apply Model" to_port="model"/>
         <connect from_port="test set" to_op="Apply Model" to_port="unlabelled data"/>
         <connect from_op="Apply Model" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
         <connect from_op="Apply Model" from_port="model" to_op="Write Model" to_port="input"/>
         <connect from_op="Performance" from_port="performance" to_port="averagable 1"/>
         <portSpacing port="source_model" spacing="0"/>
         <portSpacing port="source_test set" spacing="0"/>
         <portSpacing port="source_through 1" spacing="0"/>
         <portSpacing port="sink_averagable 1" spacing="0"/>
         <portSpacing port="sink_averagable 2" spacing="0"/>
       </process>
     </operator>
     <operator activated="true" class="write_performance" compatibility="5.3.012" expanded="true" height="60" name="Write Performance" width="90" x="514" y="120">
       <parameter key="performance_file" value="performance.per"/>
     </operator>
     <connect from_op="Read Sparse" from_port="output" to_op="Validation" to_port="training"/>
     <connect from_op="Validation" from_port="averagable 1" to_op="Write Performance" to_port="input"/>
     <portSpacing port="source_input 1" spacing="0"/>
     <portSpacing port="sink_result 1" spacing="0"/>
   </process>
 </operator>
</process>

Input data looks like this

553124  2266:-1 8045:-1 9392:-1 10397:-1 13481:1 14509:-1 17368:1 18888:1 26913:1 27083:1 27107:1 27122:-1 27859:-1 37441:1 37993:1 40703:1 48407:-1 61367:-1
553124  8549:-1 13902:-1 21611:-1 23697:-1 36878:1 40703:1 42809:-1 55147:1 55972:-1 56351:1 62848:-1
553124  2092:1 2536:-1 10411:3 12125:-1 27555:1 32520:-1 36916:1 40080:-1 40703:1 41936:1 42809:-1 43505:-1 44430:-1 46301:-1 49588:-1 54999:1 56521:1 61488:-1 61793:-1
553124  7788:1 14296:-1 22385:1 26071:-1 32520:-1 32816:-1 35700:1 39122:1 53325:-1 54817:-1
553124  1658:-1 1867:-1 2092:1 2213:-1 4929:1 5356:1 8549:-1 9381:1 11392:-1 12125:-1 13234:-1 17874:-1 20346:-1 29660:-1 31941:-1 35387:1 36916:1 40703:2 41936:1 42809:-2 43985:-1 45613:-1 49588:-1 50956:1 52474:-2 54438:-1 56521:1 63618:-1
202540  286:1 3953:1 5356:1 9072:1 13795:-1 23821:-1 41755:1 43214:-1 45612:-1 46172:1 55598:-1
202540  3407:1 37238:-1 39212:1 39218:-1 44578:1 51070:1
202540  7504:-1 11594:1 36560:-1 43513:1
202540  5356:1 6204:-1 10012:1 10168:-1 11090:1 14114:-1 14437:-1 18720:1 22369:-1 33038:1 36283:-1 38182:1 40847:1 48736:-2 49346:-1 51470:-1 62562:-1
202540  8661:-1 9381:1 19454:1 27163:1 55619:1 62149:-1 65440:1
202540  9381:1 19974:1 24768:1 25063:1 31787:-1 40703:1 43214:-1 44319:1 63377:1
Tagged:

Answers

  • Nils_WoehlerNils_Woehler Member Posts: 463 Maven
    Hi,

    internally everything is correct, the real confidences are used to predict the label. But you are in 'Read Sparse' you are using an int_sparse_array to store your data.
    When storing the confidences to this int_sparse_array the confidence values are rounded (and therefore are 0.0 all the time). If you change the datamanagement parameter to
    double_sparse_array the correct values should be shown.

    Best,
    Nils
  • javartjavart Member Posts: 2 Contributor I
    Thank you so much!
    I didn't realize that data structure would also hold the ML output.
    It works correctly after changing  'Read Sparse'  to double_sparse_array.

    Javier.
Sign In or Register to comment.