Options

How to add noises without adding attributes

navyboysnavyboys Member Posts: 15 Contributor II
edited November 2018 in Help
I just want to add noise data on the basis of original attribute-set, without adding "random" attributes. I used "Add Noise" operator and set parameter "random attributes = 0", it definitely generates no random attributes, but also generates no noise data I want. Is there any other parameters I should take into account? 

Answers

  • Options
    colocolo Member Posts: 236 Maven
    Hi,
    the noise generated for existing attributes is completely independent from the random attributes. You may set them to 0 as you did, without getting any problems for the noise itself. Did you take a look at the operator description and set the "attribute filter type" and at least one of the noise parameters (label noise, default attribute noise, noise)?
    I just created a simple example, where two attributes (att1, att2) are generated and then are copied with their respective integer values (att1_int, att2_int) to make the effect of the noise clearly visible. Then noise is added to both of them, I set the "default attribute noise" parameter to 1.0 and the results show the noise added to att1_int and att2_int.
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
      <context>
        <input/>
        <output/>
        <macros/>
      </context>
      <operator activated="true" class="process" compatibility="5.0.8" expanded="true" name="Process">
        <process expanded="true" height="269" width="413">
          <operator activated="true" class="generate_data" compatibility="5.0.8" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
            <parameter key="number_examples" value="10"/>
            <parameter key="number_of_attributes" value="2"/>
          </operator>
          <operator activated="true" class="generate_attributes" compatibility="5.0.8" expanded="true" height="76" name="Generate Attributes" width="90" x="179" y="30">
            <list key="function_descriptions">
              <parameter key="att1_int" value="floor(att1)"/>
              <parameter key="att2_int" value="floor(att2)"/>
            </list>
          </operator>
          <operator activated="true" class="add_noise" compatibility="5.0.8" expanded="true" height="94" name="Add Noise" width="90" x="313" y="30">
            <parameter key="attribute_filter_type" value="subset"/>
            <parameter key="attributes" value="att1_int|att2_int"/>
            <parameter key="default_attribute_noise" value="1.0"/>
            <list key="noise"/>
          </operator>
          <connect from_op="Generate Data" from_port="output" to_op="Generate Attributes" to_port="example set input"/>
          <connect from_op="Generate Attributes" from_port="example set output" to_op="Add Noise" to_port="example set input"/>
          <connect from_op="Add Noise" from_port="example set output" to_port="result 1"/>
          <portSpacing port="source_input 1" spacing="0"/>
          <portSpacing port="sink_result 1" spacing="0"/>
          <portSpacing port="sink_result 2" spacing="0"/>
        </process>
      </operator>
    </process>
  • Options
    navyboysnavyboys Member Posts: 15 Contributor II
    Thank you very much for your kindness.

    Perhaps I misunderstood the function provided by "Add Noise" operator. I just presumed that new data records which may be treated as "noise" will be added to the ExampleSet, for example, we have 10 data records before "Add Noise", then, when "Add Noise" is processed, extra data records (ext. 5) will be added, so the total record quantity is 15 now. Do I get the wrong message?

    Thanks & Best Regards
  • Options
    colocolo Member Posts: 236 Maven
    Hello again,

    I see what you expected from the operator but I think this is the wrong choice for your task. As the description says "Adds noise to existing attributes or add random attributes." you can just add attributes, no examples. To get some new examples into your existing set you could perhaps use "Generate (Nominal) Data" and a combination of "Rename" (to adjust the attributes) and "Append" or "Union" (does not require exactly the same attributes).
    But what about the generated values? Do you have some generation rule to fill in some realistic data values? Shall your noise be some unwanted data without meaning or shall it represent some perturbation to wanted values/signals (due to technical issues as measuring inaccuracy or something similar)? In the first case you should be fine with the "Generate Data" approach and insert some nonsense data as noise (for example "spam" that must be filtered out from the real information). Otherwise I would suggest to apply noise (via "Add Noise") to real data. If you want your ExampleSet to be extendend by some noise examples you might copy existing examples, add noise to them and then merge both sets together.

    If you have 10 examples and want to add 5 additional "noisy" examples you could apply a "Sample" operator and randomly choose 5 examples, apply noise to them and append them to the original set:
    <?xml version="1.0" encoding="UTF-8" standalone="no"?>
    <process version="5.0">
     <context>
       <input/>
       <output/>
       <macros/>
     </context>
     <operator activated="true" class="process" compatibility="5.0.8" expanded="true" name="Process">
       <process expanded="true" height="415" width="681">
         <operator activated="true" class="generate_data" compatibility="5.0.8" expanded="true" height="60" name="Generate Data" width="90" x="45" y="30">
           <parameter key="number_examples" value="10"/>
         </operator>
         <operator activated="true" class="sample" compatibility="5.0.8" expanded="true" height="76" name="Sample" width="90" x="179" y="165">
           <parameter key="sample_size" value="5"/>
           <list key="sample_size_per_class"/>
           <list key="sample_ratio_per_class"/>
           <list key="sample_probability_per_class"/>
         </operator>
         <operator activated="true" class="add_noise" compatibility="5.0.8" expanded="true" height="94" name="Add Noise" width="90" x="313" y="75">
           <parameter key="default_attribute_noise" value="1.0"/>
           <list key="noise"/>
         </operator>
         <operator activated="true" class="append" compatibility="5.0.8" expanded="true" height="94" name="Append" width="90" x="514" y="165"/>
         <connect from_op="Generate Data" from_port="output" to_op="Sample" to_port="example set input"/>
         <connect from_op="Sample" from_port="example set output" to_op="Add Noise" to_port="example set input"/>
         <connect from_op="Sample" from_port="original" to_op="Append" to_port="example set 2"/>
         <connect from_op="Add Noise" from_port="example set output" to_op="Append" to_port="example set 1"/>
         <connect from_op="Append" from_port="merged set" to_port="result 1"/>
         <portSpacing port="source_input 1" spacing="0"/>
         <portSpacing port="sink_result 1" spacing="0"/>
         <portSpacing port="sink_result 2" spacing="0"/>
       </process>
     </operator>
    </process>
    Perhaps there exist other (better) ways to solve this problem, but I think this should do it as a start. Hope this helps somehow...

    Regards,
    Matthias
  • Options
    navyboysnavyboys Member Posts: 15 Contributor II
    Hi, Matthias

    You are so kind to give me such a detailed interpretation.
    It's definitely what I want to get.

    So thank you very much to help me.

    Thanks & Best Regards
    ZHENG
Sign In or Register to comment.