The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
mining high dimensional data...
hello!!!!
i have a large dimensional data set.... actually the data set consist of about 2000 record and its dimension is about 2000 indeed....
i admit maybe i am still amateur in mining high dimensional data... ;D
what i'm going to ask is how are strategies to mine high dimensional data using RM5.
thank you for your immediate reply!!!
regs,
dimas yogatama
i have a large dimensional data set.... actually the data set consist of about 2000 record and its dimension is about 2000 indeed....
i admit maybe i am still amateur in mining high dimensional data... ;D
what i'm going to ask is how are strategies to mine high dimensional data using RM5.
thank you for your immediate reply!!!
regs,
dimas yogatama
0
Answers
I'm not clear as to what you want to know, but you should understand that RM can handle much larger datasets than you are talking about. For example if I run the following to get a 10k * 10k matrix ... It doesn't take too long. Just so you can compare I'm on XP64 double quad with 16G, and for windows boxes it is that 64 that matters, as 32 bit boxes can only address 3??G ( you'll have to Google for the right number ).
So the bottom line is that the main strategy is to have lots of memory, if I remember correctly..
what i mean strategy is that, how to optimize accuracy by selecting only "good attribute" among all available ones.... if the specs issue is critical, i only have laptop (lenovo y450-310) with core 2 duo processor @2200 ghz, and 2 gb ddr3 of ram, is it really bothering...?
after all i would like to say sorry for my english, i am still learning.
regs,
Dimas Yogatama
of course the amount of memory does make a difference. If the data doesn't fit into the memory, it either fails or you will need it to stream it from a database what might slow down your process a lot.
Coming back to the strategy question: RapidMiner offers several methods for selecting attributes. You might either use the Forward Selection or Backward Elimination operator as a simple start. If that does not suit your needs or they take too long, you might take another operator from the package and it's sub packages Data Transformation / Attribute Set Reduction and Transformation / Selection.
Greetings,
Sebastian
then what is actually affect the length of model learning by general if we talk about data? is it the total sum of the data (record) or its dimension?
well, there is no general answer for this. In some settings the complete removal of attributes works better, in some others a rescaling based on weights. The same is true if it comes to "weight by wrapper" vs. "weight by filtering". From my experience, I would say that if you have severe problems with data set size and no other option is possible for you, the calculation of weights followed by a weight based selection can help without loosing too much accuracy.
Cheers,
Ingo