The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Extract e-mail adresses out of a pdf
marcel_hanselma
Member Posts: 3 Learner I
Hello dear Rapidminer community,
I have a pdf full of adresses (name, street, phonenumber, email). What I want is to extract only all the e-mail adresses and store them line per line in an excel or csv. How is the approach to this? (I am really a Rapidminer newbie)
Greetings, Marcel
I have a pdf full of adresses (name, street, phonenumber, email). What I want is to extract only all the e-mail adresses and store them line per line in an excel or csv. How is the approach to this? (I am really a Rapidminer newbie)
Greetings, Marcel
Tagged:
0
Best Answer
-
lionelderkrikor RapidMiner Certified Analyst, Member Posts: 1,195 UnicornHi @marcel_hanselma,
Although your PDF is a scan and is not nicely formatted, it is workable : We can extract the email addresses. I used "Read Document" operator as mentioned by Jacob. Here the result :
I used a Python script to search, extract and display the e-mail addresses because it is very easy with this language.
(With RapidMiner native operator(s), I was unable to extract ALL the occurrences : I'm just able to find and extract the first occurrence.)
Thus to run the process in attached file, you will need :
- to install Python in your machine (you can install it via Anaconda)
- to install the Python scripting extension from the marketplace. Don't forget to set in the Rapidminer settings, the path where your Python.exe file is installed.
Hope this helps,
Regards,
Lionel
PS : Given that there are more than 1700 e-mails addresses in your document, the process computation is not instantaneous : You have to wait around 2 minutes...
7
Answers
Can you provide your .pdf file in order we can see how to extract the e-mail adresses ?
You can send it via private message if it is not confidential...
Regards,
Lionel
It worked flawless. :-)