prediction with new attribute

hung9022hung9022 Member Posts: 13 Contributor I
edited November 2018 in Help

Hi.
I am currently building a predictive model. I have a data set of 200x500. I want to use SVM to predict the next 10 attribute so when i finish the new data set will be 200x510. I have tried to loop and generate new attribute and replace the value with prediction values but it does not work since I do not understand how to set up the macro to generate attribute with numbers, I want the new attribute to have names like "att501, att502, att503, etc... ". Any help would be appreciated.

Answers

  • rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn

    Hello, @hung9022

     

    Do you mind sharing your XML process to understand what are you doing? Thanks in advance.

     

    All the best,

     

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    hi @hung9022,

     

    I'm agree with @rfuentealba :  Share your process will help us to better understand your problem.

     

    But as first element of answer, does this process helps you : 

    <?xml version="1.0" encoding="UTF-8"?><process version="9.0.000-BETA">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="6.0.002" expanded="true" name="Root">
    <parameter key="random_seed" value="1969"/>
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="9.0.000-BETA" expanded="true" height="68" name="Polynomial" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Samples/data/Polynomial"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="9.0.000-BETA" expanded="true" height="103" name="Multiply (3)" width="90" x="179" y="136"/>
    <operator activated="true" class="generate_id" compatibility="9.0.000-BETA" expanded="true" height="82" name="Generate ID" width="90" x="380" y="136"/>
    <operator activated="true" class="concurrency:loop" compatibility="9.0.000-BETA" expanded="true" height="82" name="Loop" width="90" x="313" y="34">
    <parameter key="number_of_iterations" value="10"/>
    <parameter key="reuse_results" value="true"/>
    <process expanded="true">
    <operator activated="true" class="generate_attributes" compatibility="9.0.000-BETA" expanded="true" height="82" name="Generate Attributes" width="90" x="380" y="34">
    <list key="function_descriptions">
    <parameter key="att_50%{iteration}" value="a1"/>
    </list>
    </operator>
    <connect from_port="input 1" to_op="Generate Attributes" to_port="example set input"/>
    <connect from_op="Generate Attributes" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="concurrency:loop_attributes" compatibility="9.0.000-BETA" expanded="true" height="82" name="Loop Attributes" width="90" x="447" y="34">
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="regular_expression" value="att_.*"/>
    <process expanded="true">
    <operator activated="true" class="set_role" compatibility="9.0.000-BETA" expanded="true" height="82" name="Set Role" width="90" x="179" y="34">
    <parameter key="attribute_name" value="%{loop_attribute}"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="9.0.000-BETA" expanded="true" height="82" name="Select Attributes" width="90" x="380" y="34">
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="regular_expression" value="att_.*"/>
    <parameter key="invert_selection" value="true"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="9.0.000-BETA" expanded="true" height="103" name="Multiply" width="90" x="447" y="136"/>
    <operator activated="true" class="support_vector_machine" compatibility="9.0.000-BETA" expanded="true" height="124" name="SVM" width="90" x="581" y="34"/>
    <operator activated="true" class="apply_model" compatibility="9.0.000-BETA" expanded="true" height="82" name="Apply Model" width="90" x="715" y="187">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="9.0.000-BETA" expanded="true" height="82" name="Select Attributes (2)" width="90" x="849" y="187">
    <parameter key="attribute_filter_type" value="regular_expression"/>
    <parameter key="regular_expression" value="att_.*|prediction(.*)"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    <operator activated="true" class="transpose" compatibility="9.0.000-BETA" expanded="true" height="82" name="Transpose" width="90" x="983" y="136"/>
    <connect from_port="input 1" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_op="Select Attributes" to_port="example set input"/>
    <connect from_op="Select Attributes" from_port="example set output" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="SVM" to_port="training set"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="SVM" from_port="model" to_op="Apply Model" to_port="model"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Transpose" to_port="example set input"/>
    <connect from_op="Transpose" from_port="example set output" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <operator activated="true" class="append" compatibility="9.0.000-BETA" expanded="true" height="82" name="Append" width="90" x="581" y="34"/>
    <operator activated="true" class="transpose" compatibility="9.0.000-BETA" expanded="true" height="82" name="Transpose (2)" width="90" x="715" y="34"/>
    <operator activated="true" class="generate_id" compatibility="9.0.000-BETA" expanded="true" height="82" name="Generate ID (2)" width="90" x="849" y="34"/>
    <operator activated="true" class="concurrency:join" compatibility="9.0.000-BETA" expanded="true" height="82" name="Join" width="90" x="983" y="85">
    <parameter key="use_id_attribute_as_key" value="false"/>
    <list key="key_attributes">
    <parameter key="id" value="id"/>
    </list>
    </operator>
    <connect from_op="Polynomial" from_port="output" to_op="Multiply (3)" to_port="input"/>
    <connect from_op="Multiply (3)" from_port="output 1" to_op="Loop" to_port="input 1"/>
    <connect from_op="Multiply (3)" from_port="output 2" to_op="Generate ID" to_port="example set input"/>
    <connect from_op="Generate ID" from_port="example set output" to_op="Join" to_port="right"/>
    <connect from_op="Loop" from_port="output 1" to_op="Loop Attributes" to_port="input 1"/>
    <connect from_op="Loop Attributes" from_port="output 1" to_op="Append" to_port="example set 1"/>
    <connect from_op="Append" from_port="merged set" to_op="Transpose (2)" to_port="example set input"/>
    <connect from_op="Transpose (2)" from_port="example set output" to_op="Generate ID (2)" to_port="example set input"/>
    <connect from_op="Generate ID (2)" from_port="example set output" to_op="Join" to_port="left"/>
    <connect from_op="Join" from_port="join" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

    Regards,

     

    Lionel

  • hung9022hung9022 Member Posts: 13 Contributor I
     

    Hi, sorry for not including the process.Here is the process I wanted to use. I read a csv file from another source which has a example set size as i have mentioned. I import a csv file from a source and generate an example set with a size 300x500. I want to predict the future values of this data set with SVM. My loop process is where i am trying the generate these values but I notice this is not the proper way of doing it. I attached the file that i want to use for this process.

    <?xml version="1.0" encoding="UTF-8"?><process version="8.2.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="8.2.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="read_csv" compatibility="8.2.001" expanded="true" height="68" name="Read CSV" width="90" x="45" y="34">
    <list key="annotations"/>
    <list key="data_set_meta_data_information"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="8.2.001" expanded="true" height="103" name="Multiply" width="90" x="179" y="34"/>
    <operator activated="true" class="remember" compatibility="8.2.001" expanded="true" height="68" name="Remember (3)" width="90" x="581" y="187">
    <parameter key="name" value="unlabeled"/>
    </operator>
    <operator activated="true" class="set_role" compatibility="8.2.001" expanded="true" height="82" name="Set Role" width="90" x="380" y="34">
    <parameter key="attribute_name" value="att1400"/>
    <parameter key="target_role" value="label"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="support_vector_machine" compatibility="8.2.001" expanded="true" height="124" name="SVM" width="90" x="514" y="34"/>
    <operator activated="true" class="concurrency:loop" compatibility="8.2.001" expanded="true" height="82" name="Loop (2)" width="90" x="648" y="34">
    <parameter key="number_of_iterations" value="1"/>
    <process expanded="true">
    <operator activated="true" class="recall" compatibility="8.2.001" expanded="true" height="68" name="Recall (2)" width="90" x="45" y="85">
    <parameter key="name" value="unlabeled"/>
    <parameter key="remove_from_store" value="false"/>
    </operator>
    <operator activated="true" class="apply_model" compatibility="8.2.001" expanded="true" height="82" name="Apply Model" width="90" x="179" y="34">
    <list key="application_parameters"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="8.2.001" expanded="true" height="82" name="Multiply (2)" width="90" x="380" y="34"/>
    <operator activated="true" class="materialize_data" compatibility="8.2.001" expanded="true" height="82" name="Materialize Data" width="90" x="179" y="187"/>
    <operator activated="true" class="set_role" compatibility="8.2.001" expanded="true" height="82" name="Set Role (3)" width="90" x="313" y="187">
    <parameter key="attribute_name" value="prediction(att1400)"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="select_attributes" compatibility="8.2.001" expanded="true" height="82" name="Select Attributes (2)" width="90" x="246" y="340">
    <parameter key="attribute_filter_type" value="single"/>
    <parameter key="attribute" value="att1401"/>
    <parameter key="invert_selection" value="true"/>
    <parameter key="include_special_attributes" value="true"/>
    </operator>
    <operator activated="true" class="rename" compatibility="8.2.001" expanded="true" height="82" name="Rename (2)" width="90" x="380" y="340">
    <parameter key="old_name" value="prediction(att1400)"/>
    <parameter key="new_name" value="att1401"/>
    <list key="rename_additional_attributes"/>
    </operator>
    <operator activated="true" class="remember" compatibility="8.2.001" expanded="true" height="68" name="Remember (4)" width="90" x="581" y="340">
    <parameter key="name" value="unlabeled"/>
    </operator>
    <connect from_port="input 1" to_op="Apply Model" to_port="model"/>
    <connect from_op="Recall (2)" from_port="result" to_op="Apply Model" to_port="unlabelled data"/>
    <connect from_op="Apply Model" from_port="labelled data" to_op="Multiply (2)" to_port="input"/>
    <connect from_op="Multiply (2)" from_port="output 1" to_op="Materialize Data" to_port="example set input"/>
    <connect from_op="Materialize Data" from_port="example set output" to_op="Set Role (3)" to_port="example set input"/>
    <connect from_op="Set Role (3)" from_port="example set output" to_op="Select Attributes (2)" to_port="example set input"/>
    <connect from_op="Select Attributes (2)" from_port="example set output" to_op="Rename (2)" to_port="example set input"/>
    <connect from_op="Rename (2)" from_port="example set output" to_op="Remember (4)" to_port="store"/>
    <connect from_op="Remember (4)" from_port="stored" to_port="output 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="source_input 2" spacing="0"/>
    <portSpacing port="sink_output 1" spacing="0"/>
    <portSpacing port="sink_output 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Read CSV" from_port="output" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Multiply" from_port="output 2" to_op="Remember (3)" to_port="store"/>
    <connect from_op="Set Role" from_port="example set output" to_op="SVM" to_port="training set"/>
    <connect from_op="SVM" from_port="model" to_op="Loop (2)" to_port="input 1"/>
    <connect from_op="Loop (2)" from_port="output 1" to_port="result 1"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    </process>
    </operator>
    </process>

     

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi again @hung9022,

     

    I must admit that I do not understand what you want to do.

    Can you explain more explicitly by giving a simple example of your initial dataset(s) and the dataset you want to obtain ?

     

    Regards,

     

    Lionel

  • hung9022hung9022 Member Posts: 13 Contributor I

    hi,

    From the  initial data set, let say a 343 row x600 column matrix. I want to run this matrix through the svm prediction algorithm to predict new values, let say 10 new values. The result example set will be 343x610 matrix where the new 10 column will be the predicted values.

    regards,

  • lionelderkrikorlionelderkrikor Moderator, RapidMiner Certified Analyst, Member Posts: 1,195 Unicorn

    Hi again @hung9022,

     

    Have you checked the process I shared in my first post ?

    Does it help you ?

     

    Regards,

     

    Lionel

  • rfuentealbarfuentealba Moderator, RapidMiner Certified Analyst, Member, University Professor Posts: 568 Unicorn

     

    Hi there,

     

    I haven't seen the XML processes posted by @hung9022 or @lionelderkrikor because my computers have been performing heavy duty tasks, but this problem is interesting.

     

    I wanted to share some thoughts before heading to lunch.

     

    In certain programming languages there is a definition of functor as a function generator that operates on collections rather than scalar variables. The SVM operator is one of these functors. For the sake of simplicity, let's say that there is a summatory (the words for "suma" and "sumatoria" in English seem to be the same word "sum", so I had to invent one here) operator that returns a sum of a variable collection (that's a basic functor).

     

    So you have the following entries:

     

    2, 2

    3, 3

     

    And you have your functor:

     

    summatory(2, 2)

    summatory(3, 3)

     

     

    The functor returns:

     

    2, 2, 4

    3, 3, 6

     

    And you want to call your functor again:

     

    summatory(2, 2, 4)

    summatory(3, 3, 6)

     

    So you can get the result:

     

    2, 2, 4, 8

    3, 3, 6, 12

     

    Now, that is what you want but with SVM instead of summatory, is that what you want?

     

    Once my computers release memory, I might be able to check both processes, but it looks strange indeed. Perhaps someone else has a better idea.

     

    All the best,

     

    Rodrigo.

  • hung9022hung9022 Member Posts: 13 Contributor I

    hi,

    @lionelderkrikor , I have just taken a look at the process you showed, it actually solved one of my problems with generate new column, and it also showed me what i did wrong as well. Thanks a lot.

    @rfuentealba , yeah what i want is the Support Vector Machine prediction. I just want to build a simple prediction algorithm, then test them with different prediction learner available in rapidminer to see which is more accurate with my data.

    Regard,

     

Sign In or Register to comment.