Interpreting LogLikelihood For LDA Topic Modeling

svtorykh · June 2018

Hi RM Community,

Based on the attached picture, how should I interpret Loglikelihood values changing with number of topics. Is higher better or lower better. Does it needs to be squared to be positive?

Thanks!

MartinLiebig · June 2018

Hi @svtorykh,

-240000 is better.

BR,

Martin

MartinLiebig · June 2018

Hi,

it's the negative LLH. The lower the better.

BR,
Martin

svtorykh · June 2018

Thanks for prompt reply, so in this case -230000 is better than -240000 or vice versa?

svtorykh · June 2018

Thanks so much Martin!

MartinLiebig · June 2018

By the way, @svtorykh,

one of the next updates will have more performance measures for LDA. Just need to find time to implement it. LLH by itself is always tricky, because it naturally falls down for more topics.

BR,

Martin

svtorykh · June 2018

That would be very nice to have! Please keep us posted Martin!

jozeftomas_2020 · June 2018

Hello. I want to find the optimal K-number for KMEANS with the LDA Loglikelihood value

For me, using alpha and beta as heuristics for the top 5 is the highest. Now, how to use K optimally. Does anyone know how to help? Thanks a lot I searched a lot, but I did not find anything:smileysad:

MartinLiebig · June 2018

Hey @jozeftomas_2020,

i am fairly confused. KMeans and LDA are fairly different models. Why and how do you want to mix them?

~Martin

jozeftomas_2020 · June 2018

In the articles I have seen using the LDA to find optimal k, but I do not know how?
And how can I understand which LDA has a better result? Alpha and beta need to be adjusted a little or too high to get a better result?

I'm so sorry
Thanks a lot

MartinLiebig · June 2018

@svtorykh,

i've added Perplexity as the default to the performance of LDA. Perplexity is defined as

exp(-LLH/tokens)

and should be minimized. That's somewhat what you see in common blog posts on LDA.

It's not yet on the marketplace. Let's see when we have enough features to publish.

Cheers,

Martin

svtorykh · June 2018

Thanks much!

MartinLiebig · June 2018

Always happy to help! Will it be possible that you present your use case at RM Wisdom in October?

ayaRizk · June 2019

@svtorykh
May I ask how you generated the evaluation plot? Is there a specific operator for that or plotted it outside of RapidMiner?

Thanks!
/Aya

MartinLiebig · June 2019

Hi Aya,

Optimize Parameters (Grid) can create the log for it.

Best,

Martin

ayaRizk · June 2019

Hi @mschmitz
Yes, this works well. Thanks a lot!
/Aya

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Interpreting LogLikelihood For LDA Topic Modeling

Best Answer

Answers