"SVM model results : display bug in charts ?"

lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 731   Unicorn

Hi,

 

I'm doing some experimentations in Rapidminer and it seems that I discovered a bug : 

I created a simply model using the "SVM" operator.

I run the process and I'm going to the results windows ->  "Kernel Model (SVM) -> Charts : 

Then I choose chart style = "Scatter" (but maybe some other chart styles are concerned by this bug) : It's impossible to display x1 (my first attribute) on x-axis and x2 (my second attribute) on y-axis and vice-versa.

Here a screenshot of the charts window : 

 

SVM_charts.png

The other  physical quantities (counter, label, function value etc.) are good displayed.

 

My training dataset (04_Class_4.6_SVM_simple_example.csv) and my score dataset (score_test_SVM.csv)

are in attached files.

 

You can find my process here : 

<?xml version="1.0" encoding="UTF-8"?><process version="7.6.002">
<context>
<input/>
<output/>
<macros/>
</context>
<operator activated="true" class="process" compatibility="7.6.002" expanded="true" name="Process">
<process expanded="true">
<operator activated="true" class="read_csv" compatibility="7.6.002" expanded="true" height="68" name="Read_TrainSet" width="90" x="45" y="85">
<parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Predictive_Analytics_and_Data_Mining\Dec 15 2014\04_Class_4.6_SVM_simple_example.csv"/>
<parameter key="column_separators" value="\s+"/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="set_role" compatibility="7.6.002" expanded="true" height="82" name="Set Role" width="90" x="179" y="34">
<parameter key="attribute_name" value="class"/>
<parameter key="target_role" value="label"/>
<list key="set_additional_roles"/>
</operator>
<operator activated="true" class="support_vector_machine" compatibility="7.6.002" expanded="true" height="124" name="SVM" width="90" x="313" y="34">
<parameter key="kernel_type" value="polynomial"/>
<parameter key="kernel_degree" value="1.0"/>
<parameter key="C" value="1.0"/>
<parameter key="convergence_epsilon" value="1.0E-5"/>
<parameter key="max_iterations" value="10000000"/>
<parameter key="scale" value="false"/>
</operator>
<operator activated="false" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Read_TrainSet (2)" width="90" x="45" y="340">
<parameter key="script" value="import pandas as pd&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def rm_main():&#10;&#10; path = 'C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Predictive_Analytics_and_Data_Mining\Dec 15 2014'&#10; data = pd.read_csv(path + '/04_Class_4.6_SVM_simple_example.csv',sep ='\s+')&#10;&#10; # connect 2 output ports to see the results&#10; return data"/>
</operator>
<operator activated="false" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="103" name="Build SVM Python" width="90" x="179" y="340">
<parameter key="script" value="import pandas as pd&#10;import numpy as np&#10;from sklearn.svm import SVC&#10;from sklearn.calibration import CalibratedClassifierCV&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def rm_main(train):&#10;&#10; X = train.iloc[:,0:2]&#10; y = train.iloc[:,2]&#10; x1 = train.iloc[:,0]&#10; x2 = train.iloc[:,1]&#10;&#10; model = SVC(kernel = 'linear', probability = True,degree = 1,tol = 1e-5,random_state = 1992 )&#10; #model_calibre = CalibratedClassifierCV(model)&#10; model_calibre = CalibratedClassifierCV(model,method = 'isotonic')&#10; model.fit(X,y)&#10; model_calibre.fit(X,y)&#10; &#10; [[w1,w2]] = model.coef_&#10; [w0] = model.intercept_&#10;&#10; support = model.support_&#10; [dual_coef] = model.dual_coef_&#10; decfunction = model.decision_function(X)&#10;&#10; support = pd.DataFrame(data =support,columns = ['support']) &#10; alpha= pd.DataFrame(data = dual_coef,columns = ['alpha'])&#10; abs_alpha = pd.DataFrame(data = np.absolute(dual_coef),columns = ['abs(alpha)'])&#10; alpha = alpha.join(abs_alpha,how = 'left')&#10; alpha = alpha.join(support,how = 'left')&#10; alpha = alpha.set_index('support')&#10;&#10; dec_func = pd.DataFrame(data = decfunction,columns = ['decision function'])&#10; dec_func = dec_func.join(y)&#10; dec_func = dec_func.join([x1,x2],how = 'outer')&#10; &#10; dec_func =pd.concat([dec_func,alpha], axis = 1)&#10; &#10; weight = pd.DataFrame(data = [[w0,w1,w2]],columns = ['w0','w1','w2']) &#10; weight = pd.concat([weight,dec_func])&#10; &#10; #weight.rm_metadata['w0']=(None,'w0')&#10; #weight.rm_metadata['w1']=(None,'w1')&#10; #weight.rm_metadata['w2']=(None,'w2')&#10; #weight.rm_metadata['decision function']=(None,'decision function')&#10; #weight.rm_metadata['label']=(None,'label')&#10; &#10;&#10; # connect 2 output ports to see the results&#10; return weight,model,model_calibre"/>
</operator>
<operator activated="true" class="read_csv" compatibility="7.6.002" expanded="true" height="68" name="Read_ScoreSet" width="90" x="313" y="187">
<parameter key="csv_file" value="C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Predictive_Analytics_and_Data_Mining\Dec 15 2014\score_test_SVM.csv"/>
<parameter key="column_separators" value="\s+"/>
<list key="annotations"/>
<list key="data_set_meta_data_information"/>
</operator>
<operator activated="true" class="apply_model" compatibility="7.6.002" expanded="true" height="82" name="Apply Model" width="90" x="447" y="136">
<list key="application_parameters"/>
</operator>
<operator activated="false" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="82" name="Read_ScoreSet (2)" width="90" x="179" y="493">
<parameter key="script" value="import pandas as pd&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def rm_main():&#10;&#10; path = 'C:\Users\Lionel\Documents\Formations_DataScience\Rapidminer\Predictive_Analytics_and_Data_Mining\Dec 15 2014'&#10; data = pd.read_csv(path + '/score_test_SVM.csv',sep ='\s+')&#10;&#10; # connect 2 output ports to see the results&#10; return data"/>
</operator>
<operator activated="false" class="python_scripting:execute_python" compatibility="7.4.000" expanded="true" height="124" name="Apply Model Python" width="90" x="447" y="391">
<parameter key="script" value="import pandas as pd&#10;from sklearn.svm import SVC&#10;&#10;# rm_main is a mandatory function, &#10;# the number of arguments has to be the number of input ports (can be none)&#10;def rm_main(model,score, model_calibre):&#10;&#10; X = score.iloc[:,0:2]&#10; &#10; pred = model.predict(X)&#10; #conf = model.predict_proba(X)&#10; conf = model_calibre.predict_proba(X)&#10; dec_function = model.decision_function(X)&#10;&#10; score['prediction (class)'] = pred&#10; score['confidence(A)'] = conf[:,0]&#10; score['confidence(B)'] = conf[:,1]&#10; score['decision function'] = dec_function&#10;&#10; score.rm_metadata['prediction (class)']=(None,'prediction (class)')&#10; score.rm_metadata['confidence(A)']=(None,'confidence(A)')&#10; score.rm_metadata['confidence(B)']=(None,'confidence(B)')&#10; score.rm_metadata['decision function']=(None,'decision function')&#10; &#10; # connect 2 output ports to see the results&#10; return score"/>
</operator>
<connect from_op="Read_TrainSet" from_port="output" to_op="Set Role" to_port="example set input"/>
<connect from_op="Set Role" from_port="example set output" to_op="SVM" to_port="training set"/>
<connect from_op="SVM" from_port="model" to_op="Apply Model" to_port="model"/>
<connect from_op="Read_TrainSet (2)" from_port="output 1" to_op="Build SVM Python" to_port="input 1"/>
<connect from_op="Build SVM Python" from_port="output 1" to_op="Apply Model Python" to_port="input 1"/>
<connect from_op="Build SVM Python" from_port="output 2" to_op="Apply Model Python" to_port="input 3"/>
<connect from_op="Read_ScoreSet" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
<connect from_op="Apply Model" from_port="labelled data" to_port="result 2"/>
<connect from_op="Apply Model" from_port="model" to_port="result 1"/>
<connect from_op="Read_ScoreSet (2)" from_port="output 1" to_op="Apply Model Python" to_port="input 2"/>
<portSpacing port="source_input 1" spacing="0"/>
<portSpacing port="sink_result 1" spacing="0"/>
<portSpacing port="sink_result 2" spacing="0"/>
<portSpacing port="sink_result 3" spacing="0"/>
<portSpacing port="sink_result 4" spacing="0"/>
<portSpacing port="sink_result 5" spacing="0"/>
</process>
</operator>
</process>

Thanks you for your explanations,

 

Regards,

 

Lionel

 

 

 

0
0 votes

Fixed and Released · Last Updated

Comments

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,351  Community Manager

    bug in scatter plot function confirmed. Pushing to dev team.


    SG

     

  • sgenzersgenzer 12Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,351  Community Manager

    fixed and scheduled for release.

Sign In or Register to comment.