"Bug in MinimalEntropyParitioning?"

Legacy User · November 2008

Hello everybody,

I get strange results when I apply MinimumEntropyPartitioning on some datasets and wonder whether this is due to a bug in the implementation.

Let me illustrate the problem: I have a dataset with one attribute ("X") and one label with two possible values.
There are 6 possible values for X, 1 to 6. In total, I have 1116 rows, with the following target label distributions:

X-value #negatives #positives #rows
1.0 124 62 186
2.0 124 62 186
3.0 0 186 186
4.0 0 186 186
5.0 124 62 186
6.0 124 62 186

Now of course I would expect a discretization into [-infty,2], ]2,4], ]4,infty] with 372. Instead, I get:

range1 [-∞ - 2] (372), range2 [2 - 5] (558), range3 [5 - ∞] (186)

It seems like there is a bug in the operator that does not correctly distinguish open and closed interval limits.
Does anybody know of a solution or a workaround?

Best,

Henrik

land · November 2008

Hi Henrik,
this seems to be a problem indeed. Perhabs you could add a tiny litte noise on your values. Resolving the not uniquenes causing your problem.

But to solve it in general I will take a look at the code.

Greetings,
Sebastian

Legacy User · November 2008

Hi Sebastian,

thanks for the reply, I also thought that the problem could be diminished if I had more continuous values. But of course if would be best if you could fix the problem in general.

Best,

Henrik

Legacy User · April 2009

Hi,

in the meantime I found the bug and fixed it. The bug is in the function
private Double getMinEntropySplitpoint(LinkedList<double[]> truncatedExamples, Attribute label) {

in the class MinimalEntropyDiscretization. It does not consider the case where a split results in 0 examples of one class. Here is the fix:

// Calculate entropies.
double entropy1 = 0.0d;
for (int i = 0; i < label.getMapping().size(); i++) {
entropy1 -= frequencies1 * MathFunctions.ld(frequencies1);
}
double entropy2 = 0.0d;
for (int i = 0; i < label.getMapping().size(); i++) {
entropy2 -= frequencies2 * MathFunctions.ld(frequencies2);
}

Best,

Henrik

IngoRM · April 2009

Hi Henrik,

thanks for sending this in! We will check and integrate your suggestion as soon as possible.

Cheers,
Ingo

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Bug in MinimalEntropyParitioning?"

Answers