The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Large Data Handeling
Hi,
I'm trying to classify huge data (about 500K rows) using Decision Trees in RapidMiner.
Unfortunately, the GUI gives me not enough memory even for 50K data.
Since C4.5 is able to learn the whole data instance by instance, Is it possible to "stream" data so it does not require the whole data in memory?
I'm trying to classify huge data (about 500K rows) using Decision Trees in RapidMiner.
Unfortunately, the GUI gives me not enough memory even for 50K data.
Since C4.5 is able to learn the whole data instance by instance, Is it possible to "stream" data so it does not require the whole data in memory?
0
Answers
2) If you have a problem then use database to store the data
3) The main problem is that the decision tree (DT) build in RM is very crappy. My advice - don't use it, use J48 from weka plugin instead, or use other models kNN, Naive Bayes etc.