The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Extracting Information using Natural Language Processing
Experts,
I'm trying to extract few key words, specially from the attached data sample. Is there any sample process that I can infer to give me the desired results.
From the attached sample. I need to extract the individual associations with the previous entities from "notes" column . Following is the output I need from the attached sample . I have around 6K notes like this.
Previous_Associations
Thanks for your time.
I'm trying to extract few key words, specially from the attached data sample. Is there any sample process that I can infer to give me the desired results.
From the attached sample. I need to extract the individual associations with the previous entities from "notes" column . Following is the output I need from the attached sample . I have around 6K notes like this.
Previous_Associations
(1) Mercury Marine |
(2) Thrivent |
(3) Pride System, Excel Capital |
(4) Rinco |
(5) Aero Network |
Thanks for your time.
Tagged:
0
Answers
One relative simple and theoretical approach might then be the following :
- Tokenise your content so you get a sentence by line
- Look for sentences that contain defined keywords like previous / was / recently etc in a close distance to any of the given company names
- Ignore the other sentences
- extract the company names
I'll see if I can find the time to get some sample working.
NLTK (and others) have defined entity recognition logic, but you will have to train these anyway to recognize your brands also. On top of that these will not have the option to understand the relation with current or previous position.
You can of course train models to do this for you, but this requires training data, and since you only have 6000 records you would almost have to manually tag them all to get a starter set which is probably a bit overkill then.
It will look for companies you provide in a simple list view, and if these are in a sentence containing one of the defined keywords it will be extracted as an entity.
It means of course you need to know the companies upfront, but you could use some other logic like looking for terms as 'founder of / CTO of / worked for' etc to get most of these up front also. This is an exercise you will have to do by default.
But yeah, it remains a challenge indeed...