"Text Mining"

Raghav · September 2010

Hi Rapid Guys,
Am Raghav and newer to this forum and RapidMiner too... coming to the question. I have an excel file with customer comments of around 20K hence I would like to do use some text analytics/ text mining to group the comments based on the available patterns. So, I request you to advice whether its possible to do the stuff using RapidMiner and if "yes", please let me know how to execute it. Thanks for the help!!

IngoRM · September 2010

Hi Raghav,

to make things short: "yes"

But I am afraid I am not able to give you all the details necessary for such an analysis within one single short post here. That's partly because I would need more information (exact data format, purpose of the analysis etc.) but partly also since setting up an analysis like this might take only 1 hour for an experienced analyst and a small project or several months for a big one. No chance to break this down to a single post.

However, I assume that you already have watched the video tutorials we are providing:

http://rapid-i.com/content/view/189/212/

More tutorials, including one about some text mining basics, can be found on

http://rapidminerresources.com/

Check out also the web site of Thomas Ott (giving an invited talk next week at the RCOMM):

http://www.neuralmarkettrends.com/

And last but not least, Rapid-I provides training courses and webinars for text mining which you will find in our shop:

http://rapid-i.com/component/option,com_virtuemart/Itemid,180/lang,de/vmcchk,1/

The last option certainly will lead to the best results in shortest time and I am not only saying that since I represent Rapid-I ;D

Cheers,
Ingo

Raghav · September 2010

Hi Ingo,
Thanks for your reply.. and also am happy to hear its possible to do the text mining on the unstructured data.
To explain my scenario in detail here, I got some excel file with comments from the customer of reputed bank in US. Case-1: (a) "I think the bank could possible extend branch hours during the week to improve convience for its customers"
(b) "I think if the branch hours were extended by an hour,it would be appreciated.Maybe you could look at closing at 5 P.M. on Thursday's would be really nice.Thank you."
(c) "I find that the branch hours should be extended beyond 3PM during the week at least twice a week"
From (a), (b), (c) we could clearly get that some set of customers required to extend the Bank Hours even though its given too specific. Here I would like to group these 3 as 1 category (namely, Extend Branch hours) and it would be easy for the bank to take action for extending the hours to satisy the customers.

Case-2: (a) "I would actually like to change my accounts to the 25th Ave office, from the E. Pleasant office. The office staff at 25th Ave are exceptional!"
(b) "the 25th avenue office is always a pleasure to visit. They employees are always pleasant and helpful. I can't say the same for other branch offices"
(c) "never a problem at 25th ave great people work there!"
In the above case, easily reveals that set of customers are more satisfied with the branch in 25th Ave and i would like to make it as group.

Ofcourse, in data, lots of groups will form like this.. from that we would take some groups which has more frequency, actionable/doable, top priority, sensible etc.,

I guess you understand my requirements very well... please advise me whats the way to do?

Also FYI, right now am using the open source version and i couldn't find the "Text Processing" option in that version. I guess it requires to include some plug-ins (but am not sure whether its open source or not). To say, I got stucked by here and really I would like to do some progress on this. One more thing, if its possible through Rapid Miner, then my company may come forward to buy this product.

Thanks,
Raghav

IngoRM · September 2010

Hi Raghav,

But I am afraid I am not able to give you all the details necessary for such an analysis within one single short post here.

By saying the sentence above I meant "I am not going to give free consulting here" and not "I would need more information before I give a free consulting" :P

However, the basic idea would be to transform the texts into a table with the Text Extension and perform a clustering on it. You could then calculate typical terms for the clusters or - often better - prototypical texts. If you have more concrete questions about how to exactly do this: please post in the appropriate boards here in the forum or consider booking a training.

Also FYI, right now am using the open source version and i couldn't find the "Text Processing" option in that version. I guess it requires to include some plug-ins (but am not sure whether its open source or not). To say, I got stucked by here and really I would like to do some progress on this. One more thing, if its possible through Rapid Miner, then my company may come forward to buy this product.

You will need the Text Extension (see http://rapid-i.com/content/view/202/206/ for more information about our RapidMiner Extensions).

One more thing, if its possible through Rapid Miner, then my company may come forward to buy this product.

Would be great. But let me add that our support has the biggest value during the creation of your project, not after you successfully implemented it - so I would not wait until then. Especially not since we currently have a nice promotion offer. End of advertising ;D

Cheers,
Ingo

deelmanibaral · September 2010

HI,

I want several attributes in the database table (or excel)to be classified to more than 5 categories.
Can you suggest how can i accomplish this task?
Is it possible with rapid miner or not?

IngoRM · September 2010

Yes. Everything else has been said before. If you have a concrete problem, please post to the correct board here in the forum.

Cheers,
Ingo

merleyc · October 2010

Hi,

I am a new user of Rapid-i and I am excited with the results I look for.
I am interested in "The Text Extensions" that supports several text formats including plain text, HTML, or PDF. It also provides standard filters for tokenization, stemming, stopword filtering, or n-gram generation.

My doubt is: I have some plain text in Portuguese or Spanish language. I want to apply the filters of the Rapid-i (tokenization, stemming, stopword filtering) and, in the end, I want to generate n-gram from these text.
Please, what is the step by step to do this?

Thanks!

Rene · October 2010

Ralf Klinkenberg did an excellent tutorial-video on
the 1st steps in text-mining using RapidMiner 5:

==> http://www.youtube.com/watch?v=TyZjom46yGA

It will probably answer your questions.

Greets,
René

Edit:
Plus consider examining this Example from Shaily on
MyExperiment.org:

==> http://www.myexperiment.org/workflows/1465.html

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

"Text Mining"

Answers