The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Text Mining Extracting Data from text
Raphael2304
Member Posts: 4 Learner I
in Help
Dear all,
I have a small problem concerning Text Mining with rapidminer. I have a bunch of press releases, all structured the same way. Now I want to extract the headline of the press releases (1st line), the date it was published (2nd line) and the coloured parts of the releases same as the whole paragraph where the coloured parts were found. All releases are within one .rtf file and are separated with section breaks. Any idea how to do it the fastest way possible?
Thanks a lot in advance!
Best
Raphael
I have a small problem concerning Text Mining with rapidminer. I have a bunch of press releases, all structured the same way. Now I want to extract the headline of the press releases (1st line), the date it was published (2nd line) and the coloured parts of the releases same as the whole paragraph where the coloured parts were found. All releases are within one .rtf file and are separated with section breaks. Any idea how to do it the fastest way possible?
Thanks a lot in advance!
Best
Raphael
0
Best Answer
-
kayman Member Posts: 662 Unicornusing a combination of split and some regex looking at newlines should do the trick.
Attached a very rough example that can get you started.
0
Answers
thanks a lot for your answer. Will try your solution right now, thanks a lot!