The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Grouping of classes

castecaste Member Posts: 9 Contributor II
edited November 2018 in Help
Hi

With RapidMiner is it possible to automatically collapse the classes in a learning set on a given number of classes by their cardinality so that variance? The goal is to improve the precision of methods such as SVM and KNN.

I have a learning set of 20.000 elements divided in more than 100 classes, with high variance in the number of elements and I need to reduce them to 20 classes.

For example:

Class A - 3 elements
Class B - 4 elements
Class C - 8 elements

It would be nice to have the opportunity to reduce to a given number of classes, i.e. 2 this way:

Class 1 - 7 elements (obtained by Class A and B)
Class 2 - 8 elements (obtained by Class C)


Please, help me!! I'm trying with operations research methods but have so less time...

Thank you!


Answers

  • landland RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 2,531 Unicorn
    Hmm,
    I'm not quite sure if I understood you correctly. You want to merge most similar classes to improve the precision? But this would not improve performance on the problem, instead it would simply change the problem...
    But if you want to do this manually, you could use the MergeNominalValues operator to do this. Perhabs you should take a look at the parameterIteration operator and its examples in the meta directory of the example processes. It could save you a lot of typing.

    Greetings,
      Sebastian
  • castecaste Member Posts: 9 Contributor II
    I needed that because I'm building an hashing system to distribute a huge load of information. The semantic bonds are not that important, so I could collapse classes without taking care of their names but of their weight in the context. This balancing helps the SVM recognition.

    Actually I solved my problem using a Operational Research method, the Assembly Line Balancing problem implementation.

    Just a note: i tried to use the evolutionary parameter optimization of the examples, but even with the examples it took really many hours, so I decided to change approach.

    Thanks for your availability and compliments for the software you realized and the choice of keeping it open source: it is really great!


Sign In or Register to comment.