Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
What is the maximum amount of rows
Jeffersonjpa
Member Posts: 5 Learner II
in Help
What is the maximum amount of rows you have already imported into the rapidminer? 10 million ?
0
Best Answers
-
David_A Administrator, Moderator, Employee, RMResearcher, Member Posts: 297 RM ResearchYou mean, what's the largest data set you can work with?
That would highly depend on your available hardware (storage space, RAM, ...) but other than that, there is no limit (considering you don't hit your license limit). On my travel laptop with only only 8GB of RAM, I could easily create a test data set with 10 million rows of random data.
But of course if you actual start working with the data, the memory requirements and practical run time limits are more complex.
I hope that helps.2 -
sgenzer Administrator, Moderator, Employee, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Managerhi @Jeffersonjpa I don't think you're really going to get an answer to this question Almost all of our customers use proprietary data and hence we are not able to give you what you're looking for. I can, however, share this example of just how powerful the platform is - given enough resources. It is a from an unnamed commercial customer running real data:
Dataset: 1.5m examples (rows), 49 attributes (columns) of which 5 were nominal and 44 were numerical
Hardware: cluster of 64 AMD Opteron 6380 chipsets (16 cores each, 2.5MHz), 504GB RAM with 384GB swap
Generalized Linear Model (GLM): runtime = 1 min 21 sec
Deep Learning (H2O implementation): runtime = 7 min 29 sec
User reported that all CPUs were "pegged" during this run with up to 180GB being consumed at times.
Does this help? It's one example. You can have another data set with the same rows and columns that produces very different runtimes due to what those rows and columns contain. All I'm trying to share is that RapidMiner will use pretty much whatever resources you throw at it.
Scott
2
Answers
it depends on the type of license you are using.
If you have a (30d ays) trial or educational license, there is no limit of rows.
The regular free license, has a limit of 10k rows and the commercial (paid) versions scale up from that limit, up again to unlimited rows.
Best regards,
David
As mentioned, a single maximum number (especially reduced only to the number of rows, without number of columns and applied algorithm) does not bear a lot of information.