The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Hi! Question on data extraction steps for Word & PDFs

pimlico35pimlico35 Member Posts: 4 Learner I
Hi folks,
Im new to this s/ware and trying to figure out some basics..... ;)

I have word and pdf files - theyre reports from various companies - what I want to do is to search for keywords (there are about 20 Im interested in) to find out the frequency of them.   Ideally, Id like to search the documents and pull the data into a spreadsheet - its very basic but I cant figure out how to do it... ;(

Ive put the docs into the folder, tried to extract data but then I get lost as Im not sure what to do next.....  if theres a quick step guide that would be great.  Apologies if this has been done but I couldnt find it.

many thanks!

Best Answer

  • pimlico35pimlico35 Member Posts: 4 Learner I
    Solution Accepted
    Thanks Martin - I will try that now.  Im just trying to find my way around operators and what the steps are to get it to work! 

    :|:smile:

Answers

  • MartinLiebigMartinLiebig Administrator, Moderator, Employee-RapidMiner, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 3,533 RM Data Scientist
    Hi,
    did you try the Read Office operator?

    Best,
    Martin
    - Sr. Director Data Solutions, Altair RapidMiner -
    Dortmund, Germany
  • pimlico35pimlico35 Member Posts: 4 Learner I
    Thanks - I needed to get the extension; works great now!   Just need to figure out how to extract keywords and frequency from the document into a table.... :)
Sign In or Register to comment.