The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Text analysis on a collection of documents

15668271566827 Member Posts: 2 Learner III
edited November 2018 in Help

Hello everybody,

 

I have a huge csv file containing several columns, and, among them, there's one containing different texts, one per row. I need to compute text analysis (that means stemming, etc.) on each of these text documents separately, in order to (for instance) calculate relative frequence of some words, or more complicated estimation methods.

 

I tried out to select only the column of interest, and I went on but I do not know how to stem a collection of documents or an example set, and to use commands that are able to systematically do the same operation on these different texts, without merging them.

 

Could you please give me a hint on how to proceed? I hope that was clear.

 

Thank you very much,

Francesco

Answers

  • sgenzersgenzer Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, Community Manager, Member, University Professor, PM Moderator Posts: 2,959 Community Manager

    seems like you should use the "Data to Documents" operator in the Text Processing extension.  You're likely going to want to only do one attribute; check the "select attributes and weights" checkbox and in Edit List, select your text attribute and leave the weight to 1.0 (doesn't really matter).  That attribute will now be a "document" and you can use any of the Text Processing operators (e.g. stemming).

     

    Scott

     

Sign In or Register to comment.