The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
hi everyone
Mahmud_elabo
Member Posts: 7 Learner I
in Help
I'm new to rapidminer . i have about 200 pdf files and I wanna do text mining and I need just keywords from those files
can anyone help here
thanks in advance
0
Best Answer
-
yyhuang Administrator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 364 RM Data ScientistHi @Mahmud_elabo,
First thing is to extract text from PDF.
You will need "Process Documents from Files" Operator from text processing extension. More demo videos are available from Academy about vectorization and extract keywords (E.g. TFIDF)
https://academy.rapidminer.com/catalog?query=text mining
You can define the location/path where PDF files are stored. If the text from PDF are stored as "images", you may need some 3rd party OCR (Optical character recognition) tool.
Hope it helps.
YY0
Answers
but still could not able to extract only keywords and make a table for word frequency
for just the keywords in the pdf files