Options

Difference between various data types

lmsasulmsasu Member Posts: 20 Contributor II
edited November 2018 in Help
Hi all,

quite newbie question: in RM5, which is the difference between "numeric" and "real" in defining metadata? Where could I find a quick help on topics like these? the "rapidminer-5.0-manual-english_v1.0.pdf" and "rapidminer-4.6-tutorial.pdf" does not talk about these simple subjects.

Thanks,
Lucian

Answers

  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    actually numeric is the supertype of real and integer.
    The same is nominal for polynominal, text and binominal.

    Greetings,
      Sebastian
  • Options
    lmsasulmsasu Member Posts: 20 Contributor II
    Thanks. Does some of the documentation specify this? Or it is supposed that I should read the code  :o

    Lucian
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi.
    unless you want to take a look into the manual, that would be a perfect idea. Anyway I would think reading the manual until page 12 is more easy...

    http://sourceforge.net/projects/rapidminer/files/1.%20RapidMiner/5.0/rapidminer-5.0-manual-english_v1.0.pdf/download

    Greetings,
      Sebastian
  • Options
    pablo_admigpablo_admig Member Posts: 5 Contributor II
    Sebastian,
      I have read the page 12 of the manual, and I can't see a difference between both, nominal and polynominal. Because both can handle categorical values. I mean, if you have the variable "color" (red, green and blue), you'll have a categorical variable, and therefore a nominal variable, is it not redundant the "poly" prefix?
    What would be the difference, alghorithmically speaking, between them ? (the same for numerical)

    Thanks in advance.
    Pablo.
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi Pablo,

    you are right: from an algorithmic point of view there is currently no difference between "polynominal" and "nominal". As far as I know, all operators which can handle one of both can automatically handle both (please correct me somebody if I forgot an operator where this would indeed make a difference). But who knows: Maybe there is such a difference later on for a new operator and the used ontology can be seen as a preparation for that. However, in today's practical processses you will be perfectly fine by using one of both options and just make sure that all operators are happy  ;)

    The same is true for numerical value types although I think that there actually are (or at least: was) some algorithm which really has relied on the fact that the input has to be "real" instead of "numerical"...

    Cheers,
    Ingo
  • Options
    andkandk Member Posts: 21 Contributor II
    Ingo Mierswa wrote:

    Hi Pablo,

    you are right: from an algorithmic point of view there is currently no difference between "polynominal" and "nominal". As far as I know, all operators which can handle one of both can automatically handle both (please correct me somebody if I forgot an operator where this would indeed make a difference). But who knows: Maybe there is such a difference later on for a new operator and the used ontology can be seen as a preparation for that. However, in today's practical processses you will be perfectly fine by using one of both options and just make sure that all operators are happy  ;)

    The same is true for numerical value types although I think that there actually are (or at least: was) some algorithm which really has relied on the fact that the input has to be "real" instead of "numerical"...

    Cheers,
    Ingo
    Ingo i think I found one, the cross distance operator namely. because i have two lists of polynominal expressions, i tried to match them on each other and the results i get are not really helpful. taken i am not completely nuts, i think this hinges on the fact that the wordlist to data operator produces a polynominal attribute but the cross distance operator just accepts nominal attribute types. 
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    actually I doubt this because each Polynomial attribute is a nominal attribute.
    I think you are trying to compute the distance between "mule" and "donkey". What is the distance? There's only one sane answer: 1. And whats the distance between "mule" and "horse"? Yes, 1. "mule" and "mule" would be zero, if you don't have already guessed...

    RapidMiner currently provides only this distance measure between nominal values. So I doubt a process comparing wordlists per row does make any sense at all?

    Greetings,
      Sebastian
  • Options
    andkandk Member Posts: 21 Contributor II
    hi sebastian, thanks fir your answers, in both of the threads, yes indeed this would help if all permutations are calculated, therefore each row of vector a with each row of vector b. in the optimal case the operator which i am looking for would give me a 1 in the case of a match a zero otherwise. is there something like this? because i am looking for a, to follow your example, a mule-mule match! i have tried cross distances and but the results are completely strange; even if there should be a match, seen by comparing the lists myself, it gives me a 1 distance. so i guess i am not handling this operator right.

    best regards, andre
  • Options
    colocolo Member Posts: 236 Maven
    Ingo Mierswa wrote:

    Hi Pablo,

    you are right: from an algorithmic point of view there is currently no difference between "polynominal" and "nominal". As far as I know, all operators which can handle one of both can automatically handle both (please correct me somebody if I forgot an operator where this would indeed make a difference). [...]

    Cheers,
    Ingo

    Hi Ingo,

    I indeed noticed an operator where the distinction between nominal and polynominal makes a differnce. I am often building web mining processes where extracted data is incrementally written to a database (appended to a table). The same process is repeated after a few days to collect data that was missed during the first run (timeouts etc.) and recently added contents.
    To find only those examples I import the relevant URLs (Read Excel) and load the already collected items from database (Read Database). Both operators are followed by "Set Role" to set IDs. Finally the "Set Minus" operator builds the desired example set. The attribute obtained from database is usually nominal and the one from the Excel file is of type polynominal. Process execetuion is interrupted as the "Set Minus" operator complains about incompatible types and requests an attribute of type polynominal. Since there is no convenient way of changing the attribute obtained from database from nominal to polynominal, I always set nominal instead of polynominal for the "Read Excel" operator. Doesn't mean much trouble for me, but shows a case where there is a difference between both types. I don't know if it is necessary there...

    Regards
    Matthias
  • Options
    landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hi,
    if you want to compare lists containing subsets of each other and you want to count the number of the same entries you can use set operations on the example sets and remove all that are not within both (Intersect) and count the number of examples. You can extract the number of examples also as macro or performance value.

    Greetings,
      Sebastian
Sign In or Register to comment.