text mining on specific section in pdf files

Mahmud_elabo · January 2021

I wanna do text mining on a specific section(for examples just abstracts) from pdf files
anyone can help here, please
thanks so much in advance

kayman · January 2021

The read document operator allows you to read your pdf as text, so you can use all of the text mining / NLP magic as if it were a text file.

Mahmud_elabo · January 2021

kayman I tried that but as I mentioned I have 200 pdf files and I need to do text mining just on a specific section like Abstracts or just introductions

kayman · January 2021

Then you need to combine with loop documents. Point it to your folder with your pdfs, extract the data that you need, one by one till number 200.

So basically create a process that works for one first, and then use it to loop through all your pdf's one by one. Whether it's 1, 20, 200 or 2000 pdf's doesn't make a difference.

You just have to decide if you want the outcome combined in a collection or finalise it in the loop process.

Mahmud_elabo · January 2021

@kayman thank you so much, I wonder if is there any video or tutorial showing these process

kayman · January 2021

Have you tried Rapidminer academy? There is plenty of training on nlp / textmining there, and around loops. You may need to combine a few but there is a ton of info there.

Also looking at youtube will provide some good info on textmining with Rapidminer.

Mahmud_elabo · January 2021

@kayman yes I have tried rapidminer community and I looked for this process on youtube but I did not find anything about what I need

Howdy, Stranger!

Quick Links

Categories

Altair RapidMiner Community

GET HELP. LEARN BEST PRACTICES. NETWORK WITH YOUR PEERS.

text mining on specific section in pdf files

Answers