How to check if some specific indicators are mentioned in a set of business reports?
We
are analysing some business annual reports (13 reports in pdf format). We are new in using
Rapidminer, but thanks to the training resources and the answers in the
community we managed to run a cluster analysis of some parts of the annual reports
we are interested in. In this kind of analysis we used the operator Process operator
documents from files to extract the
words, which are then used by the clustering operator.
Now we are interested in a different analysis, since we do not want Rapidminer to extract the list of word from the reports, but we have already a given wordlist, since we want to check if a
list of given indicators (words) are mentioned or not in the business reports. However,
I have not seen any example to learn how to create a process to get this
result. I would be very grateful if you could help me by giving some example or
indication of the operators to be used.
Thanks
Answers
I worked on a similar project some months ago.
The process in attached file extract the sentense(s) of the report where the keyword(s) appear(s).
To run the process in attached file, you will need :
- to install Python on your computer
- Install the Python Scripting extension
If this process is not adapted to your use case, please provide at least 2 representative pdf reports and a list
of indicators(words).
Regards,
Lionel