Text analysis on a documents' collection

1566827 · September 2016

Hello everybody,

I have a huge csv file containing several columns, and, among them, there's one containing different texts, one per row. I need to compute text analysis (that means stemming, etc.) on each of these text documents separately, in order to (for instance) calculate relative frequence of some words, or more complicated estimation methods.

I tried out to select only the column of interest, and I went on but I do not know how to stem a collection of documents or an example set, and to use commands that are able to systematically do the same operation on these different texts, without merging them.

Could you please give me a hint on how to proceed? I hope that was clear.

Thank you very much,

Francesco

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Text analysis on a documents' collection