Which are the most important parameters to tune for k-NN, NB, RF, DL, SVM for text classification?

jochen_hartmann · May 2017

Dear community,

I would like to compare the performance of the following five algorithms on different text classification tasks*:

Question 1: Which paramesters are the most important to optimize for each method 1-5?

Question 2: What ranges should I give those parameters in the parameter optimization operator in order to avoid "boiling the ocean"?

Thanks in advance!

* each task has between 3 to 5 classes and the text length varies between 3 to 70 words per document / example

Thomas_Ott · May 2017

Great question!

Telcontar120 · May 2017

Excellent suggestions from @Thomas_Ott as usual. I would add a couple more:

There isn't actually anything to optimize with Naive Bayes, there is only one parameter (Laplace correction) and I would definitely leave it on.
For Random Forest, I would also optimize the growing criterion (information gain, gain ratio, Gini, accuracy).
For SVM, you might also try a polynomial kernel and optimize C as well as degree in the range of 1-4.

Quick Links