The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

"Good datasets for testing algorithm performance"

wesselwessel Member Posts: 537 Maven
edited June 2019 in Help
Dear All,

I'm creating a genetic algorithm that can do both regression and classification.
I'm comparing the algorithm against neural networks, tree learners, and nearest neighbor methods.

What are good datasets to compare on?
I'm currently browsing the UCI repository, but choices are endless.
Is there some default benchmark new algorithms are tested against?

Best regards,

Wessel

Answers

  • marcin_blachnikmarcin_blachnik Member Posts: 61 Guru
    I'd recomend datasets from Isabel Guyon chalanges http://clopinet.com/challenges/. These datasets are also avaliable on UCI repository. They are not to small datasets, and they are not trivial, and finally you know what are the best results - you just have to look on the competition page. Most of the other and popular datasets like Ionosphere, SpamBase, Pima Indian Diabetes, Wisconsin Brest Cancer they are all trivial, for all this methods the best results can be obtained with linear classifier. On the Duch webpage you can see the comparison of results for all popular UCI datasets :http://www.is.umk.pl/projects/datasets.html
Sign In or Register to comment.