"Setting Custom Weights From Example Set Table"

dragoljubdragoljub Member Posts: 241 Contributor II
edited May 2019 in Help
Hi Everyone,

I have a table of attribute names and corresponding weights independently computed from outside rapid miner. I need to convert this Example Set into a weights type. I assumed that Data to Weights would do this for me, however it seems that it only creates a weight set from the attributes so that each weight set to 1... what is the purpose of this?

I know there is the weight by user specification but that requires manual setting of each attribute weight. I have 765 attributes and weights that I cannot manually enter.

How Can I generate a weight table for use in rapid miner from my own data?!  ???

Thanks,
-Gagi
Tagged:

Answers

  • dragoljubdragoljub Member Posts: 241 Contributor II
    After some messing around I found a hack to get around this problem. The key is using the read weights operator, to read in the XML formatted weights as shown below:

    <?xml version="1.0" encoding="windows-1252"?>
    <attributeweights version="5.0">
       <weight name="Att_1" value="0.123"/>
       <weight name="Att_2" value="0.213"/>
       <weight name="Att_3" value="0.321"/>
    </attributeweights>
    Once you understand the format, use the write special operator to output the formatted XML code using this formatting string.
        <weight name="$v[Attribute]" value="$v[Weight]"/>
    All you need is an example set with a column named Attribute (containing the attribute name string) and a column named Weight (containing the real number weight).

    I manually added the first 2 and last lines to complete the weights file. Once I imported the file I saved the weight result in the repository for easy access.

    It would make sense for the DATA to Weight operator to be able to handle this function also. For example, If the data set has columns named Weight and Attribute the operator should create the weights file.  ;D

    -Gagi
  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    HI Gagi,
    that is true indeed. Please add a feature request to the bugtracker.

    But your process is cool, would you share it on myExperiment using the Community Extension?

    Greetings,
      Sebastian
  • RLynxRLynx Member Posts: 18 Contributor II
    I do have exactly same problem:
    I have a table of attribute names and corresponding weights independently computed from R script.
    I need to convert this Example Set into a weights type.  
    The method described above works for me - Thank you, Gagi!!!
    but it would be nice for "Data to Weight" operator to be able to handle this situation directly.

    Thank you!
    best,
    RLynx

  • suleymansahalsuleymansahal Member Posts: 27 Contributor II

    I have also come to this page for converting data to weight vector. After reading the proposed solution I tried to automate the process. Until RapidMiner staff will add this functionality, the following silly but working code can be used.

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Samples/data/Titanic Training"/>
    </operator>
    <operator activated="true" class="weight_by_correlation" compatibility="7.6.001" expanded="true" height="82" name="Weight by Correlation" width="90" x="179" y="34">
    <parameter key="normalize_weights" value="true"/>
    <parameter key="sort_direction" value="descending"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="7.6.001" expanded="true" height="103" name="Multiply" width="90" x="313" y="34"/>
    <operator activated="true" class="weights_to_data" compatibility="7.6.001" expanded="true" height="68" name="Weights to Data" width="90" x="447" y="34"/>
    <operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="82" name="Weight Data Converter" width="90" x="581" y="136">
    <process expanded="true">
    <operator activated="true" class="set_macro" compatibility="7.6.001" expanded="true" height="82" name="Temp File Path" width="90" x="45" y="34">
    <parameter key="macro" value="weight_file_path"/>
    <parameter key="value" value="Desktop/average_weight.dat"/>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Create Pseudo Attr" width="90" x="179" y="34">
    <list key="function_descriptions">
    <parameter key="_bas" value="&quot;&lt;weight name=&quot;"/>
    <parameter key="_orta" value="&quot; value=&quot;"/>
    <parameter key="_son" value="&quot;/&gt;&quot;"/>
    </list>
    </operator>
    <operator activated="true" class="order_attributes" compatibility="7.6.001" expanded="true" height="82" name="ReOrder" width="90" x="313" y="34">
    <parameter key="attribute_ordering" value="_bas|Attribute|_orta|Weight|_son"/>
    </operator>
    <operator activated="true" class="write_message" compatibility="7.6.001" expanded="true" height="82" name="Write Opening Tags" width="90" x="447" y="34">
    <parameter key="file" value="%{weight_file_path}"/>
    <parameter key="text" value="&lt;?xml version=&quot;1.0&quot; encoding=&quot;windows-1254&quot;?&gt;&#10;&#10;&lt;attributeweights version=&quot;7.6&quot;&gt;&#10;"/>
    </operator>
    <operator activated="true" class="write_special" compatibility="7.6.001" expanded="true" height="68" name="Write Weight Values" width="90" x="581" y="34">
    <parameter key="example_set_file" value="%{weight_file_path}"/>
    <parameter key="special_format" value="$t$a[&quot;]"/>
    <parameter key="quote_nominal_values" value="false"/>
    <parameter key="overwrite_mode" value="append"/>
    </operator>
    <operator activated="true" class="write_message" compatibility="7.6.001" expanded="true" height="82" name="Write Closing Tags" width="90" x="715" y="34">
    <parameter key="file" value="%{weight_file_path}"/>
    <parameter key="text" value="&lt;/attributeweights&gt;"/>
    <parameter key="mode" value="append"/>
    </operator>
    <operator activated="true" class="legacy:read_weights" compatibility="7.6.001" expanded="true" height="68" name="Read Weight File" width="90" x="849" y="34">
    <parameter key="attribute_weights_file" value="%{weight_file_path}"/>
    </operator>
    <connect from_port="in 1" to_op="Temp File Path" to_port="through 1"/>
    <connect from_op="Temp File Path" from_port="through 1" to_op="Create Pseudo Attr" to_port="example set input"/>
    <connect from_op="Create Pseudo Attr" from_port="example set output" to_op="ReOrder" to_port="example set input"/>
    <connect from_op="ReOrder" from_port="example set output" to_op="Write Opening Tags" to_port="through 1"/>
    <connect from_op="Write Opening Tags" from_port="through 1" to_op="Write Weight Values" to_port="input"/>
    <connect from_op="Write Weight Values" from_port="through" to_op="Write Closing Tags" to_port="through 1"/>
    <connect from_op="Read Weight File" from_port="output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Retrieve Titanic Training" from_port="output" to_op="Weight by Correlation" to_port="example set"/>
    <connect from_op="Weight by Correlation" from_port="weights" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Weights to Data" to_port="attribute weights"/>
    <connect from_op="Multiply" from_port="output 2" to_port="result 1"/>
    <connect from_op="Weights to Data" from_port="example set" to_op="Weight Data Converter" to_port="in 1"/>
    <connect from_op="Weight Data Converter" from_port="out 1" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    </process>
    </operator>
    </process>
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    hello @suleymansahal -

     

    Thank you for this.  After much conversation internally we agree that this would be a nice improvement to "Data to Weights".  Thanks.

     

    Scott

  • Thomas_OttThomas_Ott RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,761 Unicorn

    Does not using the Set Role operator to assign the attribute column to a weight role not do the job or am I not understanding the problem correctly?

     

    <?xml version="1.0" encoding="UTF-8"?><process version="7.6.001">
    <context>
    <input/>
    <output/>
    <macros/>
    </context>
    <operator activated="true" class="process" compatibility="7.6.001" expanded="true" name="Process">
    <process expanded="true">
    <operator activated="true" class="retrieve" compatibility="7.6.001" expanded="true" height="68" name="Retrieve Titanic Training" width="90" x="45" y="34">
    <parameter key="repository_entry" value="//Samples/data/Titanic Training"/>
    </operator>
    <operator activated="true" class="weight_by_correlation" compatibility="7.6.001" expanded="true" height="82" name="Weight by Correlation" width="90" x="179" y="34">
    <parameter key="normalize_weights" value="true"/>
    <parameter key="sort_direction" value="descending"/>
    </operator>
    <operator activated="true" class="multiply" compatibility="7.6.001" expanded="true" height="103" name="Multiply" width="90" x="313" y="34"/>
    <operator activated="true" class="weights_to_data" compatibility="7.6.001" expanded="true" height="68" name="Weights to Data" width="90" x="313" y="187"/>
    <operator activated="true" class="multiply" compatibility="7.6.001" expanded="true" height="103" name="Multiply (2)" width="90" x="447" y="187"/>
    <operator activated="true" class="set_role" compatibility="7.6.001" expanded="true" height="82" name="Set Role" width="90" x="581" y="238">
    <parameter key="attribute_name" value="Weight"/>
    <parameter key="target_role" value="weight"/>
    <list key="set_additional_roles"/>
    </operator>
    <operator activated="true" class="subprocess" compatibility="7.6.001" expanded="true" height="82" name="Weight Data Converter" width="90" x="581" y="136">
    <process expanded="true">
    <operator activated="true" class="set_macro" compatibility="7.6.001" expanded="true" height="82" name="Temp File Path" width="90" x="45" y="34">
    <parameter key="macro" value="weight_file_path"/>
    <parameter key="value" value="Desktop/average_weight.dat"/>
    </operator>
    <operator activated="true" class="generate_attributes" compatibility="7.6.001" expanded="true" height="82" name="Create Pseudo Attr" width="90" x="179" y="34">
    <list key="function_descriptions">
    <parameter key="_bas" value="&quot;&lt;weight name=&quot;"/>
    <parameter key="_orta" value="&quot; value=&quot;"/>
    <parameter key="_son" value="&quot;/&gt;&quot;"/>
    </list>
    </operator>
    <operator activated="true" class="order_attributes" compatibility="7.6.001" expanded="true" height="82" name="ReOrder" width="90" x="313" y="34">
    <parameter key="attribute_ordering" value="_bas|Attribute|_orta|Weight|_son"/>
    </operator>
    <operator activated="true" class="write_message" compatibility="7.6.001" expanded="true" height="82" name="Write Opening Tags" width="90" x="447" y="34">
    <parameter key="file" value="%{weight_file_path}"/>
    <parameter key="text" value="&lt;?xml version=&quot;1.0&quot; encoding=&quot;windows-1254&quot;?&gt;&#10;&#10;&lt;attributeweights version=&quot;7.6&quot;&gt;&#10;"/>
    </operator>
    <operator activated="true" class="write_special" compatibility="7.6.001" expanded="true" height="68" name="Write Weight Values" width="90" x="581" y="34">
    <parameter key="example_set_file" value="%{weight_file_path}"/>
    <parameter key="special_format" value="$t$a[&quot;]"/>
    <parameter key="quote_nominal_values" value="false"/>
    <parameter key="overwrite_mode" value="append"/>
    </operator>
    <operator activated="true" class="write_message" compatibility="7.6.001" expanded="true" height="82" name="Write Closing Tags" width="90" x="715" y="34">
    <parameter key="file" value="%{weight_file_path}"/>
    <parameter key="text" value="&lt;/attributeweights&gt;"/>
    <parameter key="mode" value="append"/>
    </operator>
    <operator activated="true" class="legacy:read_weights" compatibility="7.6.001" expanded="true" height="68" name="Read Weight File" width="90" x="849" y="34">
    <parameter key="attribute_weights_file" value="%{weight_file_path}"/>
    </operator>
    <connect from_port="in 1" to_op="Temp File Path" to_port="through 1"/>
    <connect from_op="Temp File Path" from_port="through 1" to_op="Create Pseudo Attr" to_port="example set input"/>
    <connect from_op="Create Pseudo Attr" from_port="example set output" to_op="ReOrder" to_port="example set input"/>
    <connect from_op="ReOrder" from_port="example set output" to_op="Write Opening Tags" to_port="through 1"/>
    <connect from_op="Write Opening Tags" from_port="through 1" to_op="Write Weight Values" to_port="input"/>
    <connect from_op="Write Weight Values" from_port="through" to_op="Write Closing Tags" to_port="through 1"/>
    <connect from_op="Read Weight File" from_port="output" to_port="out 1"/>
    <portSpacing port="source_in 1" spacing="0"/>
    <portSpacing port="source_in 2" spacing="0"/>
    <portSpacing port="sink_out 1" spacing="0"/>
    <portSpacing port="sink_out 2" spacing="0"/>
    </process>
    </operator>
    <connect from_op="Retrieve Titanic Training" from_port="output" to_op="Weight by Correlation" to_port="example set"/>
    <connect from_op="Weight by Correlation" from_port="weights" to_op="Multiply" to_port="input"/>
    <connect from_op="Multiply" from_port="output 1" to_op="Weights to Data" to_port="attribute weights"/>
    <connect from_op="Multiply" from_port="output 2" to_port="result 1"/>
    <connect from_op="Weights to Data" from_port="example set" to_op="Multiply (2)" to_port="input"/>
    <connect from_op="Multiply (2)" from_port="output 1" to_op="Weight Data Converter" to_port="in 1"/>
    <connect from_op="Multiply (2)" from_port="output 2" to_op="Set Role" to_port="example set input"/>
    <connect from_op="Set Role" from_port="example set output" to_port="result 3"/>
    <connect from_op="Weight Data Converter" from_port="out 1" to_port="result 2"/>
    <portSpacing port="source_input 1" spacing="0"/>
    <portSpacing port="sink_result 1" spacing="0"/>
    <portSpacing port="sink_result 2" spacing="0"/>
    <portSpacing port="sink_result 3" spacing="0"/>
    <portSpacing port="sink_result 4" spacing="0"/>
    </process>
    </operator>
    </process>
  • Arjan_SAArjan_SA Member Posts: 5 Contributor I
    Hi @sgenzer,
    It occurs to me that this question initially started in 2010. 7 years later, you mention you internally decided this would be a nice improvement. Now, another year later, the "Data to Weights" still has not been improved in this way.

    So now I'm running into this issue as well and still need to use the "silly but working" workaround.

    I'm new at Rapid Miner. The great thing about the tool is that there is quite some knowledge on the internet. However, it also occurs to me that there are a lot of very old posts (2010, 2012, 2017) where people mention improvements or limitations that still are not available in the currently latest version.

    Is Rapid Miner actually being maintained?

    Thanks,
    Arjan
  • Telcontar120Telcontar120 Moderator, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 1,635 Unicorn
    Did you see the suggestion about using the Set Role operator?  Does that not solve your problem?  If not, can you explain further why not?

    Brian T.
    Lindon Ventures 
    Data Science Consulting from Certified RapidMiner Experts
  • MartinLiebigMartinLiebig Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,503 RM Data Scientist

    i think the Converters Extension got an operator to exactly do this for quite a while now. I think the thread is simply not up to date. Try ExampleSet to Weights from there.

    BR,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    hi @Arjan_SA so thank you for your comments. Yes believe me the software is constantly being improved as well as maintained - better than it ever has been. We do keep all the old comments here on the community as there are often "nuggets of gold" in them that still ring true years later. But we don't update every discussion as it would be virtually impossible. :wink:

    SO at the end of the day, you see from @mschmitz that the ExampleSet to Weights operator in the Operator Toolbox does this quite nicely now.

    Happy RapidMining!

    Scott
  • Arjan_SAArjan_SA Member Posts: 5 Contributor I
    Hi @Telcontar120,
    I assume you mean assign the "weight" special attribute to an attribute, right? 
    If I do that, my examples are not accepted as a object-"weight-input".

    I'm going to look into the solution @mschmitz provided....
  • Arjan_SAArjan_SA Member Posts: 5 Contributor I
    Hi @mschmitz,

    Thanks for your answer, the ExampleSet to Weights from the Converters Extension does exactly what this thread is about and what I was looking for. I just did not find it myself..... :smile:

    I do understand @sgenzer, that it's pretty impossible to keep all threads fully updated. At least this one is updated now..... :smiley:

    Having said this.... just thinking out loud.... I bet there could be a way to use Text Analytics to extract all issues mentioned in threads and their solution. Then use BI to mix that with your version information and/or backlog. From there it might be possible to create some automatically generated overview of issues raised, questions asked and the most current solution.....?
    Seems like an interesting challenge with pretty powerful results.....?


  • sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    haha yes believe it or not I have tackled pieces of this from time to time. In fact you may not have noticed but if someone posts a new question, it gets mysteriously "tagged" about 24 hrs later. That's because I have deployed some text mining that tags all new discussions every night. :wink:

    If you (or anyone else) is interested in using this community as a text mining project for these kind of ideas, I am more than happy to give you the data set to play with. Just let me know.

    Scott

Sign In or Register to comment.