The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Transform data into table with every attribute representation
moritz_moeller
Member Posts: 5 Learner I
Hey there,
since my data set is too big to analyze it with a clustering algorithm (moreover I don't want to wait as long as it needs), I want to transform it into a smaller set.
The question I have is if it is possible to transform it into a data set that represents every attribute in a representative amount? For example: I have a data set that has 3 columns that all have 5 different, possible values (i.e. 1-5) and 10 million rows. Now I want to have a data set that contains all 3 columns with all types of values but only 100k rows so that I can analyze it. Is there an option to do that automatically in RM? If not I think I have to do it manually somehow.
Thanks and Greetings,
Moritz
since my data set is too big to analyze it with a clustering algorithm (moreover I don't want to wait as long as it needs), I want to transform it into a smaller set.
The question I have is if it is possible to transform it into a data set that represents every attribute in a representative amount? For example: I have a data set that has 3 columns that all have 5 different, possible values (i.e. 1-5) and 10 million rows. Now I want to have a data set that contains all 3 columns with all types of values but only 100k rows so that I can analyze it. Is there an option to do that automatically in RM? If not I think I have to do it manually somehow.
Thanks and Greetings,
Moritz
0
Best Answer
-
SGolbert RapidMiner Certified Analyst, Member Posts: 344 UnicornHi Moritz,I haven't found dimension reduction techniques for polinomial variables in RM. Maybe it is possible to use feature selection.Regarding the rows, these are the examples you are using for training and testing. It is up to you, how many examples you want to use. There is no need to use all the rows, at least while you are not deploying the final model. It of course depends of the kind of data also, if it is a time series the approach should be different.Regards,Sebastian6
Answers
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts