Input file format for Process Documents From File operator

ccricha · May 2017

Does anyone know what text structure is expected or can be parsed using the Process Documents from Files operator? I am working on Ch 15 of the book written by Markus Hofmann and Ralf Klinkenberg. They use the Process Documents from Files operator to loop over a bunch of text files containing hotel rating data. An entry for a single hotel looks like this:

<Author>everywhereman2
<Content>Truncated for brevity....
<Date>Jan 6, 2009
<Rating>5 5 5 5 5 5 5 5

What irks me is that there absolutely nothing in the documentation for this operator telling me that is an acceptable text structure that can be parsed. Does anyone happen to know more about this operator?

Thomas_Ott · May 2017

The Text Processing extension is a bit sparse on operator reference.

What I would do is review the Text Analytics KB and watch these videos on how to properly load/parse text data and build models from it.

I will be recording a very detailed and updated Text Mining in RapidMiner video over the next few weeks.

ccricha · May 2017

Are there plans to update the documentation for this extension? Even just some JavaDoc would be better than nothing.

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

Input file format for Process Documents From File operator

Best Answers