Simple preprocessing methods

ContemnoContemno Member Posts: 6 Contributor II
edited November 2018 in Help
Hello there,

I'm looking for simple preprocessing methods.
Maybe I'm just blind but I can't find anything that matches my criteria.

1. A simple recoding.
For example produce an atrribute B out of an existing attribute A (containing values from 1 to 5) by the following rules:
A: 1,2 --> B: 1
A: 3,4 --> B: 2
A: 5 --> B: 3

2. A simple condition-knot.
For example produce an attribute B out of an existing attribute A (containing ages of humans) like this:
A: 1-18 / to 18 --> B: 1 or "young"
A: 19-40 --> B: 2 or "midage"
A: from 41--> B: 3 or "old"

Thank you in advance.
Greets from the baltic sea,
Sebastian L.

Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi Sebastian,
    there is a simple operator called UserBasedDiscretization what exactly does what your are searching for. To solve your second problem you might edit the list as follows:
    First line is called young and its upper limit is 18. So the interval will be negative infinity to 18
    Second line is called midage and its upper limit is 40. The interval will be >= 18 and < 40.

    This would look like tihs in XML

              <parameter key="young" value="18.0"/>
              <parameter key="midage" value="40.0"/>
              <parameter key="old" value="2000.0"/>
    To solve the first problem you could make use of UserBasedDiscretization and another operator called NominalNumbers2Numerical. I think you can quite comprehend what this leads to :) Just enter as new value a number like "1" or "2" and then use this operator to change that attribute into a numerical one, if you need it numerical. If you need to process changes only on one or a few of all attributes, use AttributeSubsetPreprocessing, to select the attributes the inner operators should work on.

    Hope I could help,
      Greetings Sebastian
  • ContemnoContemno Member Posts: 6 Contributor II
    Thx for your answer.
    Unfortunately it's not working as it should be.

    When I use the knot Nominal2Numeric the values are changed completely.
    A "48" maybe is changed to a "1". (not the mentioned recoding)

    The problem is that without this knot the recoding isn't done on this value.

    A second problem accured. How can I delete rows with missing values. Incomplete ones in other words.

    You told me to use AttributeSubsetPreprocessing, to select the attributes I need to process changes on.
    But this knot is only able to selct one attribute. Isn't it?
    Maybe there is a possibility to define more than one atrribute in "attribute_name_regex"?
    I need to define the attributes by name wich the following is processed on.

    Thx for any help.
  • TobiasMalbrechtTobiasMalbrecht Moderator, Employee, Member Posts: 294 RM Product Management
    Hi Sebastian,

    hm, thats three questions in a single posting ... so, here we go ...
    Contemno wrote:

    Thx for your answer.
    Unfortunately it's not working as it should be.

    When I use the knot Nominal2Numeric the values are changed completely.
    A "48" maybe is changed to a "1". (not the mentioned recoding)

    The problem is that without this knot the recoding isn't done on this value.
    You might have missed that (the other) Sebastian has recommended the NominalNumbers2Numeric operator, not the Nominal2Numeric operator! This should work as expected.
    Contemno wrote:

    A second problem accured. How can I delete rows with missing values. Incomplete ones in other words.
    This can be done by filtering the example set with the operator called ExampleFilter and setting the condition_class parameter to "no_missing_attributes".
    Contemno wrote:

    You told me to use AttributeSubsetPreprocessing, to select the attributes I need to process changes on.
    But this knot is only able to selct one attribute. Isn't it?
    Maybe there is a possibility to define more than one atrribute in "attribute_name_regex"?
    I need to define the attributes by name wich the following is processed on.
    The "attribute_name_regex" parameter indeed does allow regular expressions to define the attributes. Hence, the operators inside the AttributeSubsetPreprocessing are applied on all attributes matching the regular expressions. If you want e.g. to apply the inner operators on say two attributes called age and weight, the corresponding regular expression which lets you chose these attributes is age|weight . You may find additional information on regular expressions in the Rapidminer tutorial which is available on the documentation area of our website:

    http://rapid-i.com/content/view/36/83/lang,de/

    How that helps to solve your problems,
    regards,
    Tobias
  • ContemnoContemno Member Posts: 6 Contributor II
    Thank you so much Tobias.
    You halped me a lot. It's working now very well.

    But theres another question. You wrote:
    If you want e.g. to apply the inner operators on say two attributes called age and weight, the corresponding regular expression which lets you chose these attributes is age|weight .

    I'm not familiar with regular expressions. You gave an example with an " | " to combine two attributes.
    The tutorial is in this case a bit "meager". Is there any good explaination of all expressions? (wildcards, ...)

    Here my case:
    I've 56 attributes (e.g. ID, age, regio, ANT_U30, ANT_U35, ... , P_Expert, P_Vkude,...).
    Now I wanna filter all attributes beginning with "ANT_" because there are twelve of them and I don't wanna write them all down separately.
    In short a shortcut for "ANT_U20|ANT_U25|ANT_U30|ANT_U35|...".

    Thx in advance.
    Sebastian
  • steffensteffen Member Posts: 347 Maven
    Hello Sebastian

    The pattern you are looking for is: ANT_*
    where * is representing any letter.

    To learn more about regular expressions:
    basic concepts: http://en.wikipedia.org/wiki/Regular_expression
    tutorial for regular expressions in java : http://www.javaregex.com/tutorial.html (weird design, but the tutorial is nice)

    hope this was helpful

    greetings

    Steffen
Sign In or Register to comment.