The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
What is the maximum amount of rows
Jeffersonjpa
Member Posts: 5 Learner II
in Help
What is the maximum amount of rows you have already imported into the rapidminer? 10 million ?
0
Best Answers
-
David_A Administrator, Moderator, Employee-RapidMiner, RMResearcher, Member Posts: 297 RM ResearchYou mean, what's the largest data set you can work with?
That would highly depend on your available hardware (storage space, RAM, ...) but other than that, there is no limit (considering you don't hit your license limit). On my travel laptop with only only 8GB of RAM, I could easily create a test data set with 10 million rows of random data.
But of course if you actual start working with the data, the memory requirements and practical run time limits are more complex.
I hope that helps.2 -
sgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Managerhi @Jeffersonjpa I don't think you're really going to get an answer to this question Almost all of our customers use proprietary data and hence we are not able to give you what you're looking for. I can, however, share this example of just how powerful the platform is - given enough resources. It is a from an unnamed commercial customer running real data:
Dataset: 1.5m examples (rows), 49 attributes (columns) of which 5 were nominal and 44 were numerical
Hardware: cluster of 64 AMD Opteron 6380 chipsets (16 cores each, 2.5MHz), 504GB RAM with 384GB swap
Generalized Linear Model (GLM): runtime = 1 min 21 sec
Deep Learning (H2O implementation): runtime = 7 min 29 sec
User reported that all CPUs were "pegged" during this run with up to 180GB being consumed at times.
Does this help? It's one example. You can have another data set with the same rows and columns that produces very different runtimes due to what those rows and columns contain. All I'm trying to share is that RapidMiner will use pretty much whatever resources you throw at it.
Scott
2
Answers
it depends on the type of license you are using.
If you have a (30d ays) trial or educational license, there is no limit of rows.
The regular free license, has a limit of 10k rows and the commercial (paid) versions scale up from that limit, up again to unlimited rows.
Best regards,
David
As mentioned, a single maximum number (especially reduced only to the number of rows, without number of columns and applied algorithm) does not bear a lot of information.