The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

[SOLVED] Speed / Evaluation time improvement of kNN Classifier

jaysonprydejaysonpryde Member Posts: 20 Contributor II
Good day,

   I've already developed a Java application, which uses RapidMiner.jar (and the other jars), to classify my test data. Classifier that I've used is kNN (k=3, distance measure = cosine similarity). I've already performed the necessary optimization with respect to k and distance measure to be used.
My model is comprised of 25k data set/rows, 31 attributes.
 Now, when I ran a test data, which is a CSV file with an average of 3k data set/rows, execution time is quite very long, which is 1 hr+ (avg).
 Do you have any suggestions/recommendations on how I can improve the execution/evaluation time of my kNN classifier application based on the details I've mentioned?

Hoping to receive feedback. Thank you  :)

Answers

  • MariusHelfMariusHelf RapidMiner Certified Expert, Member Posts: 1,869 Unicorn
    Hi,

    as you probably know, kNN is a lazy learner, which means that training a model is very fast (basically just storing the training set), but application is quite slow, since for each new example the k nearest neighbours have to be found. The only possibility to reduce execution time of kNN is to reduce the size of the training set, either by removing attributes or by removing examples (where the latter will probably have the greater impact).
    Otherwise I would suggest to use another learner than kNN. Basically any learner which actually creates a model will be way faster during application than kNN. Additionally you may be able to learn something about your data by looking at the model. A linear SVM for example outputs the example weights, such that you can see how big the influence of an attribute is for classification. You may want to try: SVM (linear or rbf kernel), decision trees, Linear Regression if you have a regression problem, ...

    Best regards,
    Marius
  • jaysonprydejaysonpryde Member Posts: 20 Contributor II
    Thank you very much for this feedback! :)
Sign In or Register to comment.