The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Generate Prediction Ranking after LDA (Topic Modeling) Process
Hi RM Community!
I'm running LDA (Topic Modeling) process on my text data and generating 30 topics. How can I apply Generate Prediction Ranking after LDA process, so my output will contain 3-5 columns with highest confidence level topics for specific document (row in the table)?
Thanks!
0
Answers
Hi,
since LDA provides confidences and predictions there is no difference to classification problems here. That saying, i can't tell you how to do this off the top of my head.. maybe @sgenzer knows?
~Martin
Dortmund, Germany
Thanks! Hope someone from RM team can help with this, as LDA generates one final Prediction of topic based on highest confidence value of all 30 topics for that document. I need to be more flexible and be able to generate more ranked predicted topics columns in my output based on 2nd, 3rd... confidence values. This is potentially possible to do manually in excel, but what's the value of RM then:)
ok challenge accepted Here's a classic sgenzer ETL hack job for you. It's not pretty but the 2nd to last operator (Filter Example Range) allows you to select how many confidences you want.
Scott
Thanks for the effort guys! Will this work with 20K documents as well?
I see no reason why not..... :smileywink:
Hi @sgenzer and @svtorykh,
i just reminded myself that there is an operator for this. It's called Generate prediction ranking and should do the trick!
Sorry for not remembering this first. I think i only used this operator once 4 years ago.
Best,
Martin
Dortmund, Germany
Actually, that was the first operator I tried to use, but it wouldn't work with LDA confidences for some reason. Is it possible to see the process flow of applying ranking generator after LDA operator? I think some of the attributes must be changed, but not sure how to do it.
Hi Scott,
In your process, at which point of time Att1 is created? Can't find it in the example set.
Hi @svtorykh,
you found a bug in LDA - somewhat.
usually, all confidences are identified by its role. It's usually confidence_CLASSNAME. When I programmed the operator I used Confidence_CLASSNAME (with a capital C) as the role for the probabilities. Thus it's not working. You need to manually switch the roles of the confidence attributes (maybe with a loop). Attached is a process which demonstrates that it works afterward.
I will fix this bug, but most likely not this nor next week. there is another feature which needs to be merged first and I am traveling next week 3 days.
Best,
Martin
Dortmund, Germany
"Generate Prediction Ranking" - nope never seen that one before! You learn something every day! Thanks @mschmitz for rendering my messy ETL completely useless
@svtorykh that Att1 was generated when you Transpose - does it automatically.
Scott
Thanks! Would you please fix the descriptions of Alpha and Beta Heuristics? I think the descriptions need to be switched between the two!
@svtorykh,
sure will do! Thanks for reporting.
FYI - there a small bug in the current market place version. Alpha heuristics is wrong by a factor of #topics. This will be fixed in the next version. I've further added a feature to control Mallet's auto-tuning of alpha/beta.
Best,
Martin
Dortmund, Germany