The Altair Community is migrating to a new platform to provide a better experience for you. The RapidMiner Community will merge with the Altair Community at the same time. In preparation for the migration, both communities are on read-only mode from July 15th - July 24th, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here.
Options

What is the maximum amount of rows

JeffersonjpaJeffersonjpa Member Posts: 5 Contributor I
What is the maximum amount of rows you have already imported into the rapidminer? 10 million ?

Best Answers

  • Options
    David_ADavid_A Administrator, Moderator, Employee, RMResearcher, Member Posts: 297 RM Research
    Solution Accepted
    You mean, what's the largest data set you can work with?
    That would highly depend on your available hardware (storage space, RAM, ...) but other than that, there is no limit (considering you don't hit your license limit). On my travel laptop with only only 8GB of RAM, I could easily create a test data set with 10 million rows of random data.
    But of course if you actual start working with the data, the memory requirements and practical run time limits  are more complex.

    I hope that helps.
  • Options
    sgenzersgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager
    Solution Accepted
    hi @Jeffersonjpa I don't think you're really going to get an answer to this question :smiley: Almost all of our customers use proprietary data and hence we are not able to give you what you're looking for. I can, however, share this example of just how powerful the platform is - given enough resources. It is a from an unnamed commercial customer running real data:

    Dataset: 1.5m examples (rows), 49 attributes (columns) of which 5 were nominal and 44 were numerical
    Hardware: cluster of 64 AMD Opteron 6380 chipsets (16 cores each, 2.5MHz), 504GB RAM with 384GB swap

    Generalized Linear Model (GLM): runtime = 1 min 21 sec
    Deep Learning (H2O implementation): runtime = 7 min 29 sec

    User reported that all CPUs were "pegged" during this run with up to 180GB being consumed at times.

    Does this help? It's one example. You can have another data set with the same rows and columns that produces very different runtimes due to what those rows and columns contain. All I'm trying to share is that RapidMiner will use pretty much whatever resources you throw at it.

    Scott

Answers

  • Options
    David_ADavid_A Administrator, Moderator, Employee, RMResearcher, Member Posts: 297 RM Research
    Hi,

    it depends on the type of license you are using.
    If you have a (30d ays) trial or educational license, there is no limit of rows.
    The regular free license, has a limit of 10k rows and the commercial (paid) versions scale up from that limit, up again to unlimited rows.

    Best regards,
    David
  • Options
    JeffersonjpaJeffersonjpa Member Posts: 5 Contributor I
    but what is the maximum number of rows you have already imported into production? I would like real examples.
  • Options
    David_ADavid_A Administrator, Moderator, Employee, RMResearcher, Member Posts: 297 RM Research
    Do you need a single number (like in extreme use case) or as an average (like in a survey of the typical size of data).
    As mentioned, a single maximum number (especially reduced only to the number of rows, without number of columns and applied algorithm) does not bear a lot of information.
Sign In or Register to comment.