Options

Moving SVM model to web production

Robi_MeRobi_Me Member Posts: 32 Maven
edited April 2021 in Help
I have built a small database of speeches that have been made over the past 40 years, I have scored these speeches in terms of their level of consideration based on the various parts of speech and words used. From this historic scoring I want to be able to score future speeches that are provided into a MySQL db via a web interface. I have attached the training data set as well as the test data set below. 

I have tested the various models and SVM has the best R2 and lowest root mean squared deviation. The model may be overfit, due to the number of attributes I am using. I would appreciate your thoughts on that.

What I really need to find out if how to productionalise the SVM model into a MySQL/PHP type environment? 

<?xml version="1.0" encoding="UTF-8"?><process version="9.7.002">
  <context>
    <input/>
    <output/>
    <macros>
      <macro>
        <key>label</key>
        <value>Response</value>
      </macro>
      <macro>
        <key>label_positive_class</key>
        <value>yes</value>
      </macro>
    </macros>
  </context>
  <operator activated="true" class="process" compatibility="9.4.000" expanded="true" name="Process" origin="GENERATED_TEMPLATE">
    <parameter key="logverbosity" value="init"/>
    <parameter key="random_seed" value="2001"/>
    <parameter key="send_mail" value="never"/>
    <parameter key="notification_email" value=""/>
    <parameter key="process_duration_for_mail" value="30"/>
    <parameter key="encoding" value="SYSTEM"/>
    <process expanded="true">
      <operator activated="true" class="read_csv" compatibility="9.7.002" expanded="true" height="68" name="Read CSV" width="90" x="246" y="391">
        <parameter key="csv_file" value="new.csv"/>
        <parameter key="column_separators" value=";"/>
        <parameter key="trim_lines" value="false"/>
        <parameter key="use_quotes" value="true"/>
        <parameter key="quotes_character" value="'"/>
        <parameter key="escape_character" value="\"/>
        <parameter key="skip_comments" value="true"/>
        <parameter key="comment_characters" value="#"/>
        <parameter key="starting_row" value="1"/>
        <parameter key="parse_numbers" value="true"/>
        <parameter key="decimal_character" value="."/>
        <parameter key="grouped_digits" value="false"/>
        <parameter key="grouping_character" value=","/>
        <parameter key="infinity_representation" value=""/>
        <parameter key="date_format" value=""/>
        <parameter key="first_row_as_names" value="true"/>
        <list key="annotations"/>
        <parameter key="time_zone" value="SYSTEM"/>
        <parameter key="locale" value="English (United States)"/>
        <parameter key="encoding" value="UTF-8"/>
        <parameter key="read_all_values_as_polynominal" value="false"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="COL 1.true.real.attribute"/>
          <parameter key="1" value="COL 2.true.real.attribute"/>
          <parameter key="2" value="COL 3.true.real.attribute"/>
          <parameter key="3" value="COL 4.true.real.attribute"/>
          <parameter key="4" value="COL 5.true.real.attribute"/>
          <parameter key="5" value="COL 6.true.real.attribute"/>
          <parameter key="6" value="COL 7.true.real.attribute"/>
          <parameter key="7" value="COL 8.true.real.attribute"/>
          <parameter key="8" value="COL 9.true.real.attribute"/>
          <parameter key="9" value="COL 10.true.real.attribute"/>
          <parameter key="10" value="COL 11.true.real.attribute"/>
          <parameter key="11" value="COL 12.true.real.attribute"/>
          <parameter key="12" value="COL 13.true.real.attribute"/>
          <parameter key="13" value="COL 14.true.real.attribute"/>
          <parameter key="14" value="COL 15.true.real.attribute"/>
          <parameter key="15" value="COL 16.true.real.attribute"/>
          <parameter key="16" value="COL 17.true.real.attribute"/>
          <parameter key="17" value="COL 18.true.real.attribute"/>
          <parameter key="18" value="COL 19.true.real.attribute"/>
          <parameter key="19" value="COL 20.true.real.attribute"/>
          <parameter key="20" value="COL 21.true.real.attribute"/>
          <parameter key="21" value="COL 22.true.real.attribute"/>
          <parameter key="22" value="COL 23.true.real.attribute"/>
          <parameter key="23" value="COL 24.true.real.attribute"/>
          <parameter key="24" value="COL 25.true.real.attribute"/>
          <parameter key="25" value="COL 26.true.real.attribute"/>
          <parameter key="26" value="COL 27.true.real.attribute"/>
          <parameter key="27" value="COL 28.true.real.attribute"/>
          <parameter key="28" value="COL 29.true.real.attribute"/>
          <parameter key="29" value="COL 30.true.real.attribute"/>
          <parameter key="30" value="COL 31.true.real.attribute"/>
          <parameter key="31" value="COL 32.true.real.attribute"/>
          <parameter key="32" value="COL 33.true.real.attribute"/>
          <parameter key="33" value="COL 34.true.real.attribute"/>
          <parameter key="34" value="COL 35.true.real.attribute"/>
          <parameter key="35" value="results\.request_id.true.polynominal.attribute"/>
          <parameter key="36" value="name.true.polynominal.attribute"/>
          <parameter key="37" value="url.true.polynominal.attribute"/>
        </list>
        <parameter key="read_not_matching_values_as_missings" value="false"/>
        <parameter key="datamanagement" value="double_array"/>
        <parameter key="data_management" value="auto"/>
      </operator>
      <operator activated="true" class="read_csv" compatibility="9.7.002" expanded="true" height="68" name="Read CSV (2)" width="90" x="45" y="187">
        <parameter key="csv_file" value="training.csv"/>
        <parameter key="column_separators" value=";"/>
        <parameter key="trim_lines" value="false"/>
        <parameter key="use_quotes" value="true"/>
        <parameter key="quotes_character" value="&quot;"/>
        <parameter key="escape_character" value="\"/>
        <parameter key="skip_comments" value="true"/>
        <parameter key="comment_characters" value="#"/>
        <parameter key="starting_row" value="1"/>
        <parameter key="parse_numbers" value="true"/>
        <parameter key="decimal_character" value="."/>
        <parameter key="grouped_digits" value="false"/>
        <parameter key="grouping_character" value=","/>
        <parameter key="infinity_representation" value=""/>
        <parameter key="date_format" value=""/>
        <parameter key="first_row_as_names" value="true"/>
        <list key="annotations"/>
        <parameter key="time_zone" value="SYSTEM"/>
        <parameter key="locale" value="English (United States)"/>
        <parameter key="encoding" value="UTF-8"/>
        <parameter key="read_all_values_as_polynominal" value="false"/>
        <list key="data_set_meta_data_information">
          <parameter key="0" value="COL 1.true.real.attribute"/>
          <parameter key="1" value="COL 2.true.real.attribute"/>
          <parameter key="2" value="COL 3.true.real.attribute"/>
          <parameter key="3" value="COL 4.true.real.attribute"/>
          <parameter key="4" value="COL 5.true.real.attribute"/>
          <parameter key="5" value="COL 6.true.real.attribute"/>
          <parameter key="6" value="COL 7.true.real.attribute"/>
          <parameter key="7" value="COL 8.true.real.attribute"/>
          <parameter key="8" value="COL 9.true.real.attribute"/>
          <parameter key="9" value="COL 10.true.real.attribute"/>
          <parameter key="10" value="COL 11.true.real.attribute"/>
          <parameter key="11" value="COL 12.true.real.attribute"/>
          <parameter key="12" value="COL 13.true.real.attribute"/>
          <parameter key="13" value="COL 14.true.real.attribute"/>
          <parameter key="14" value="COL 15.true.real.attribute"/>
          <parameter key="15" value="COL 16.true.real.attribute"/>
          <parameter key="16" value="COL 17.true.real.attribute"/>
          <parameter key="17" value="COL 18.true.real.attribute"/>
          <parameter key="18" value="COL 19.true.real.attribute"/>
          <parameter key="19" value="COL 20.true.real.attribute"/>
          <parameter key="20" value="COL 21.true.real.attribute"/>
          <parameter key="21" value="COL 22.true.real.attribute"/>
          <parameter key="22" value="COL 23.true.real.attribute"/>
          <parameter key="23" value="COL 24.true.real.attribute"/>
          <parameter key="24" value="COL 25.true.real.attribute"/>
          <parameter key="25" value="COL 26.true.real.attribute"/>
          <parameter key="26" value="COL 27.true.real.attribute"/>
          <parameter key="27" value="COL 28.true.real.attribute"/>
          <parameter key="28" value="COL 29.true.real.attribute"/>
          <parameter key="29" value="COL 30.true.real.attribute"/>
          <parameter key="30" value="COL 31.true.real.attribute"/>
          <parameter key="31" value="COL 32.true.real.attribute"/>
          <parameter key="32" value="COL 33.true.real.attribute"/>
          <parameter key="33" value="COL 34.true.real.attribute"/>
          <parameter key="34" value="COL 35.true.real.attribute"/>
          <parameter key="35" value="score.true.real.label"/>
        </list>
        <parameter key="read_not_matching_values_as_missings" value="false"/>
        <parameter key="datamanagement" value="double_array"/>
        <parameter key="data_management" value="auto"/>
      </operator>
      <operator activated="true" class="multiply" compatibility="9.7.002" expanded="true" height="103" name="Multiply" width="90" x="179" y="187"/>
      <operator activated="true" class="concurrency:cross_validation" compatibility="8.2.000" expanded="true" height="145" name="Cross Validation" origin="GENERATED_TEMPLATE" width="90" x="45" y="493">
        <parameter key="split_on_batch_attribute" value="false"/>
        <parameter key="leave_one_out" value="false"/>
        <parameter key="number_of_folds" value="10"/>
        <parameter key="sampling_type" value="shuffled sampling"/>
        <parameter key="use_local_random_seed" value="false"/>
        <parameter key="local_random_seed" value="1992"/>
        <parameter key="enable_parallel_execution" value="true"/>
        <process expanded="true">
          <operator activated="false" class="naive_bayes" compatibility="9.7.002" expanded="true" height="82" name="Naive Bayes" origin="GENERATED_TEMPLATE" width="90" x="45" y="34">
            <parameter key="laplace_correction" value="true"/>
          </operator>
          <operator activated="true" class="support_vector_machine" compatibility="9.7.002" expanded="true" height="124" name="SVM" width="90" x="112" y="187">
            <parameter key="kernel_type" value="dot"/>
            <parameter key="kernel_gamma" value="1.0"/>
            <parameter key="kernel_sigma1" value="1.0"/>
            <parameter key="kernel_sigma2" value="0.0"/>
            <parameter key="kernel_sigma3" value="2.0"/>
            <parameter key="kernel_shift" value="1.0"/>
            <parameter key="kernel_degree" value="2.0"/>
            <parameter key="kernel_a" value="1.0"/>
            <parameter key="kernel_b" value="0.0"/>
            <parameter key="kernel_cache" value="200"/>
            <parameter key="C" value="0.0"/>
            <parameter key="convergence_epsilon" value="0.001"/>
            <parameter key="max_iterations" value="1000000"/>
            <parameter key="scale" value="true"/>
            <parameter key="calculate_weights" value="true"/>
            <parameter key="return_optimization_performance" value="true"/>
            <parameter key="L_pos" value="1.0"/>
            <parameter key="L_neg" value="1.0"/>
            <parameter key="epsilon" value="0.0"/>
            <parameter key="epsilon_plus" value="0.0"/>
            <parameter key="epsilon_minus" value="0.0"/>
            <parameter key="balance_cost" value="true"/>
            <parameter key="quadratic_loss_pos" value="false"/>
            <parameter key="quadratic_loss_neg" value="true"/>
            <parameter key="estimate_performance" value="false"/>
          </operator>
          <connect from_port="training set" to_op="SVM" to_port="training set"/>
          <connect from_op="SVM" from_port="model" to_port="model"/>
          <connect from_op="SVM" from_port="exampleSet" to_port="through 1"/>
          <portSpacing port="source_training set" spacing="0"/>
          <portSpacing port="sink_model" spacing="0"/>
          <portSpacing port="sink_through 1" spacing="0"/>
          <portSpacing port="sink_through 2" spacing="0"/>
        </process>
        <process expanded="true">
          <operator activated="true" class="apply_model" compatibility="9.7.002" expanded="true" height="82" name="Apply Model (2)" origin="GENERATED_TEMPLATE" width="90" x="45" y="34">
            <list key="application_parameters"/>
            <parameter key="create_view" value="false"/>
          </operator>
          <operator activated="true" class="performance" compatibility="9.7.002" expanded="true" height="82" name="Performance" width="90" x="179" y="34">
            <parameter key="use_example_weights" value="true"/>
          </operator>
          <connect from_port="model" to_op="Apply Model (2)" to_port="model"/>
          <connect from_port="test set" to_op="Apply Model (2)" to_port="unlabelled data"/>
          <connect from_op="Apply Model (2)" from_port="labelled data" to_op="Performance" to_port="labelled data"/>
          <connect from_op="Performance" from_port="performance" to_port="performance 1"/>
          <connect from_op="Performance" from_port="example set" to_port="test set results"/>
          <portSpacing port="source_model" spacing="0"/>
          <portSpacing port="source_test set" spacing="0"/>
          <portSpacing port="source_through 1" spacing="0"/>
          <portSpacing port="source_through 2" spacing="0"/>
          <portSpacing port="sink_test set results" spacing="0"/>
          <portSpacing port="sink_performance 1" spacing="0"/>
          <portSpacing port="sink_performance 2" spacing="0"/>
        </process>
      </operator>
      <operator activated="true" class="apply_model" compatibility="9.7.002" expanded="true" height="82" name="Apply Model" origin="GENERATED_TEMPLATE" width="90" x="514" y="493">
        <list key="application_parameters"/>
        <parameter key="create_view" value="false"/>
      </operator>
      <operator activated="true" class="numerical_to_polynominal" compatibility="9.7.002" expanded="true" height="82" name="Numerical to Polynominal" width="90" x="380" y="187">
        <parameter key="attribute_filter_type" value="single"/>
        <parameter key="attribute" value="score"/>
        <parameter key="attributes" value=""/>
        <parameter key="use_except_expression" value="false"/>
        <parameter key="value_type" value="numeric"/>
        <parameter key="use_value_type_exception" value="false"/>
        <parameter key="except_value_type" value="real"/>
        <parameter key="block_type" value="value_series"/>
        <parameter key="use_block_type_exception" value="false"/>
        <parameter key="except_block_type" value="value_series_end"/>
        <parameter key="invert_selection" value="false"/>
        <parameter key="include_special_attributes" value="true"/>
      </operator>
      <operator activated="true" class="weight_by_information_gain" compatibility="9.7.002" expanded="true" height="82" name="Weight by Information Gain" width="90" x="514" y="187">
        <parameter key="normalize_weights" value="false"/>
        <parameter key="sort_weights" value="true"/>
        <parameter key="sort_direction" value="ascending"/>
      </operator>
      <connect from_op="Read CSV" from_port="output" to_op="Apply Model" to_port="unlabelled data"/>
      <connect from_op="Read CSV (2)" from_port="output" to_op="Multiply" to_port="input"/>
      <connect from_op="Multiply" from_port="output 1" to_op="Numerical to Polynominal" to_port="example set input"/>
      <connect from_op="Multiply" from_port="output 2" to_op="Cross Validation" to_port="example set"/>
      <connect from_op="Cross Validation" from_port="model" to_op="Apply Model" to_port="model"/>
      <connect from_op="Cross Validation" from_port="example set" to_port="result 3"/>
      <connect from_op="Cross Validation" from_port="test result set" to_port="result 4"/>
      <connect from_op="Cross Validation" from_port="performance 1" to_port="result 5"/>
      <connect from_op="Apply Model" from_port="labelled data" to_port="result 7"/>
      <connect from_op="Apply Model" from_port="model" to_port="result 6"/>
      <connect from_op="Numerical to Polynominal" from_port="example set output" to_op="Weight by Information Gain" to_port="example set"/>
      <connect from_op="Weight by Information Gain" from_port="weights" to_port="result 1"/>
      <connect from_op="Weight by Information Gain" from_port="example set" to_port="result 2"/>
      <portSpacing port="source_input 1" spacing="0"/>
      <portSpacing port="sink_result 1" spacing="147"/>
      <portSpacing port="sink_result 2" spacing="0"/>
      <portSpacing port="sink_result 3" spacing="0"/>
      <portSpacing port="sink_result 4" spacing="0"/>
      <portSpacing port="sink_result 5" spacing="0"/>
      <portSpacing port="sink_result 6" spacing="0"/>
      <portSpacing port="sink_result 7" spacing="0"/>
      <portSpacing port="sink_result 8" spacing="0"/>
      <description align="left" color="blue" colored="true" height="228" resized="true" width="317" x="20" y="105">Step 1:&lt;br&gt;Load and prepare data</description>
      <description align="left" color="green" colored="true" height="224" resized="true" width="291" x="349" y="108">Step 2:&lt;br&gt;Determine which factors influence the model to improve prediction.</description>
      <description align="left" color="green" colored="true" height="321" resized="true" width="442" x="198" y="340">Step 5:&lt;br&gt;Load new data for scoring</description>
      <description align="left" color="yellow" colored="false" height="70" resized="true" width="850" x="20" y="25">Civility scorecard&lt;br&gt;Create a model that looks at scores related to the receptiviti model and predict what the final score will be</description>
      <description align="left" color="yellow" colored="true" height="322" resized="true" width="176" x="18" y="339">Step 4:&lt;br&gt;Build and test model</description>
    </process>
  </operator>
</process>


Best Answer

  • Options
    BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Solution Accepted
    Hi Robi_Me,

    there are models that are simple, and there are ones that need a full implementation of the matching software for applying.

    E. g. Linear Regression is just a formula, decision trees are just a bunch of if ... then conditions etc. SVM is more complex unfortunately. 

    In the RapidMiner ecosystem, RTS or AI Hub is meant for this task. They would provide you exactly the web service you need.

    Studio doesn't offer this functionality. 
    You could rebuild your model in Python or R and put those into some container on your web server. But that's outside the scope of this community. 

    Regards,
    Balázs

Answers

  • Options
    BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi @Robi_Me,

    RapidMiner AI Hub contains functionality for exposing processes as web services. Either directly with the web service functionality, or using the Real-Time Scoring engine which has a higher performance. 
    AI Hub can be installed on premise or used as a cloud service hosted by RapidMiner, paid according to the usage. 

    You could also check if RapidMiner Go creates a similarly good model for you. (Text mining isn't yet implemented in Go though.) Go offers a few-clicks way to to export models for scoring with a web service.

    Regards,
    Balázs
  • Options
    Robi_MeRobi_Me Member Posts: 32 Maven
    @BalazsBarany thanks ,I have already looked at GO, but got better results out of Studio. Having never been in a situation where I needed to use a model outside of the RapidMiner environment. I am more trying to understand if a model is created inside of studio how does one use this model in an external  environment. Not everyone I do work for uses Rapid Miner, no matter how much I try and encourage them to, so now that I am at a point where I am ready to export results to a client, how would I get to implement an SVM model on their web services?
  • Options
    Robi_MeRobi_Me Member Posts: 32 Maven
    thank you @BalazsBarany
Sign In or Register to comment.