The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Before, after, timestamp paragraph
How can paragraphs be extracted from free text using different types of data-time representation?
Case:
Patient x, birth date February 5 1960
At age of 5 years, dental surgery
20/10/2012 laparoscopy with postoperative infection with pseudomonas, allergy for antibiotics without further investigation
2010 traffic accident
1976-03-10 ankle surgery
Today admitted to ICU
This text should be processed to result in a 2 columns: Text and Date
If this extraction is bulletproof, a sankey chart from a cohort of patients would be possible to be made.
Thanks
Sven
0
Answers
So each line becomes a text & date?
Date | Text
N/A | At age of 5 years, dental surgery
20/10/2012 | laparoscopy with postoperative infection with pseudomonas, allergy for antibiotics without further investigation
01/06/2010 | traffic accident *estimated
1976-03-10 | ankle surgery
now() | admitted to ICU
Thanks for the feedback.
In fact I need a date (sometimes calculated e.g. at 5 years should be calculated from date of birth) and a text column. The problem is that in free text a date is sometimes related to text that follows the date or vice versa.
Is there a trick to end up with 2 colums text and date?
Thanks
There is no simple "trick" IMO, it's going to get complex. However, with a structured database (perhaps look into Neo4J for this?) then it should be possible.
What you're looking to do is build relationships between date stamped text entities with other date stamped text entities.
Then using those date stamped text entities to help guide extraction of new date stamped text entities from text containing dates.
Hi,
your best guess is using a list of rules from experience in Generate Attributes. For example:
This catches three of the dates in your input and is already becoming hard to read. You could always get conversion errors from stuff like 33/11/2017 and so on, so it would be best to apply this line for line in an exception handler.
You might want to try using a library like lubridate in R to more easily convert the date string candidates you identified in the text.
But it will be a mess anyway, I wish you good luck.
Regards,
Balázs
@BalazsBarany the date extraction part is one of the more easiest parts. It's the link between those dates where it becomes more tricky. (Which is why I suggested a graph database)
Here's an example of a process that I use for date extraction from text, where the date could be in multiple formats.
https://community.rapidminer.com/t5/Original-Rapid-I-Forum/Extracting-date-from-textfiles/m-p/30203
You could use this with the addition of predictive model learned on the historically captured text data to add context such as "5 years old = dateofbirth + 5 years
there's probably some good fuzzy matching solution to this...it would be very nice if RapidMiner had some fancy algorithms in some kind of data prep wizard to do that for us, eh? Just sayin', @IngoRM...
Meanwhile I actually was doing something similar a year or so ago - trying to parse out dates from newspaper death notices. Here is the block I created (it's a complete mess but maybe something inside is useful for you? It's a 100% brute force solution)
Scott