RapidMiner 9.7 is Now Available

Lots of amazing new improvements including true version control! Learn more about what's new here.


[Solved] Data type "real" vs. "numeric"

qwertzqwertz Member Posts: 130  Maven
Dear community,

On the first sight it seems to be a very basic question but even after reviewing the manual and a search in this forum I was not able to find an answer:

What is the difference between "real" and "numeric" values in Rapidminer? When to use which one?

Real numbers can be by definition any arbitrary point on a number line. In contrast to that the description for numeric values in the manual is "for numerical values in general". This sound pretty much like the same to me... but probably it is not...



  • aborgaborg Member Posts: 66 Contributor II
    I am not 100% sure, but imho the numeric can be either real or integer (difference between is in semantics of division for example). In that regard it might make sense (for example if your operator can handle either integer or real values without conversion, this might be what you expect). But I can also be wrong.
  • Nils_WoehlerNils_Woehler Member Posts: 463  Guru

    aborg is right. Numeric can be either real or integer, whereas real can only be.. real ;-)

  • qwertzqwertz Member Posts: 130  Maven

    Thank you for your response.

    From a mathematical point of view integer is a sub group of real. So I guess Rapidminer distinguishes between real and numerical due to performance reasons then? When should I take which one?

    Best regards
  • wesselwessel Member Posts: 537  Guru
    With high likelihood, someone from the RM-team can say something more sensible than me.
    But until they do so, here is my thoughts.

    The classification by maths is irrelevant here.
    In computer science floats and integers are different!
    See the Java Documentation on how floats and integers effect division, multiplication, etc.

    As with anything with performance, test and find out!
    The Java Virtual Machine basically makes it impossible to give a general statement about performance.

    As a rule of thumb, you always want to use floats, unless you have specific reasons to use integers.
    When I use "Generate ID" and then "Generate Attributes" @ id=id/2, the id attribute is automatically converted from integer to real.
    Try use "Read CSV" @ datamanagement = int_array, you will get some funny results when you try to read floats.

    Best regards,

  • qwertzqwertz Member Posts: 130  Maven

    Hi Wessel,

    I got your arguments and I agree that under this circumstances it is probably reasonable to make a test.

    So if I have float values (like e.g. temperature readings, stock market data or weight data) I could use both, the numeric and real data type, right? I would just test performance to go for the faster one.

    Best regards
  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869   Unicorn
    Internally in RapidMiner, Reals, Integers and Numericals are all the same and represented as floating point numbers. It is just a representation to the user (and to learning schemes) with some rules, e.g. only store whole numbers in integer attributes etc.

    In the RapidMiner type hierarchy, Numerical is the Supertype of Integer and Real, and should be avoided. We are planning to deprecate the direct use of Numerical and recommend only the use of the well-defined subclasses, i.e. Integer and Real.

    Best regards,
  • qwertzqwertz Member Posts: 130  Maven

    Alright, then I go for "real" in my case.

    Thanks a lot! :)

Sign In or Register to comment.