The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Can I conduct LDA model and emotion analysis with Rapidminer in Chinese text?
Hi everyone,
I am a newbie here and this is my question.
I need to apply Latent Dirichlet Allocation model and emotion analysis to Chinese text, but I don't know whether I can do these with Rapidminer, or which extensions I need to install further to be able to conduct the analyses.
I have already searched discussions about Chinese/mandarin, and already installed the Hanminer extensions mentioned in a discussion. But I don't think the Hanminer extensions are enough to conduct both analyses, and no one seems to put forward the question before.
Please give me some suggestions. Any ideas would be much appreciated!
Best,
Polly
I am a newbie here and this is my question.
I need to apply Latent Dirichlet Allocation model and emotion analysis to Chinese text, but I don't know whether I can do these with Rapidminer, or which extensions I need to install further to be able to conduct the analyses.
I have already searched discussions about Chinese/mandarin, and already installed the Hanminer extensions mentioned in a discussion. But I don't think the Hanminer extensions are enough to conduct both analyses, and no one seems to put forward the question before.
Please give me some suggestions. Any ideas would be much appreciated!
Best,
Polly
0
Answers
from my understanding, it should work. But @yyhuang is or mandarin expert.
Cheers,
Martin
Dortmund, Germany
Thank you for your reply.
I read other discussions about LDA, and just to make sure, if I want to conduct Latent Dirichlet Allocation model, is 'Linear Discriminant Analysis' the operator that I should use? Is it the 'Extract Topic from Data' operator that most people mentioned in the discussions?
Also, I wonder which operator I should use to conduct emotion analysis? Is it the Singular Value Decomposition (SVD)?
Besides, because in a discussion about LDA that no results showed in the process, you asked whether "is this 'western' text? LDA uses a default tokenization on this tokens like spaces and so on. This may totally fail if this is not in latin alphabet?", I guess the text language has a great influence on the results. Thus, to conduct analysis with Chinese text, are there any extensions or operators I need to install or combine to use?
Sorry for the huge amount of questions. I would be much appreciated if you could give me some advice. Thanks in advance!
Regards,
Polly
And yes, LDA uses tokenization inside. And i just realized, that the default tokenization is on \s and not changeable, so i guess it is very hard to be applied on mandarin. As i said - I only speak German and English and am just not an expert on tokenization of mandarin/cantonese. So i don't know if it would even help if I offer the tokenization as an option.
Dortmund, Germany
Thank you for your help
I hope maybe @yyhuang can give me some advice on it.
Cheers,
Polly