Need help in analyzing medical keywords in a column of free text. Excel does not work here..
hi Everyone.
I need some serious help. I have been working on an excel file. Just one column. It contains data coming from pHysicians offices. Its a string of free text that the doctor would write down when examining a patient. This column pertains to the daignosis information. I need to create a model to give this data some structure.
I am specifically trying to filter out all conditions that are related to migraine. the way I am doing it in microsoft excel is that I am using the "if,error,search" functions to sniff out the keywords from the table. I need two kinds of Keywords:
includes: i.e all keywords that can be "Migraines"
excludes: i.e all keywords that if present can never be migraines.
Sometimes I have to combine "includes" and "excludes" to find out the actual migraine.. for example:
Includes = Migraine
Excludes = family History of
in this case I am trying to look for a patient with Migraine, not someone who has a family history of Migraine. So I need to exclude the text "family History of". its like "this string should include this keyword and exclude this keyword"
I think this should be faily simple in rapidminer. It is taking my hours and hours of formulas in excel and driving me crazy since i have about half a million rows to analyse and too many formulas. The objective is to create a model that i can scale up to other diseases as well.
Can anyone help...
I am attaching the excel file with some data as well as some examples of includes and excludes I am using. Created a zip file with the excel file inserted
Thanks
Arsalan (MD)
Answers
You could try the "Replace" operator. It allows you to replace your values with some regular expression logic. Afterwards it's just a matter of filtering the examples. However, be careful with respect to typos. You may for example want to just look for "migr", instead of "Migraine" to decrease the chance of missing something.
Hi FBT
Thanks for replying, but I cant figure out how to use the Replace function.
I tried it but it does not seem to be changing anything in my original text.
Ok, there is actually a simpler way. You can just use the "Filter Example" operator and select "Contain" together with your specified keyword to filter for what you are looking for. Take a look at the process below (just copy & paste it into your XML tab and press the green checkmark on the left top side).
The "No Migraine" filter is just for illustration purposes. You would need to enter whatever your relevant keyword is. If you have more, remember to select "match any" within the operators filtering panel.
Super - I have crossed stage 1 with your help. Please check attached file.
Now for stage 2 - I need to create seperate attributes for each of these key words. I need a table like this below.
By doing this i could easily generate aggregates and do my counts. Need help in creating this table.
See, if the below process does what you are looking for:
It's based on your sample file again, which does not have an example of Hemicrania, hence I replaced it with Rheumatic Fever, for demo purposes. A word of caution: migraine and headache show up simultaniously in one example. This may become a cause of error in your further analysis, hence you may want to set a more elaborate filter, to make sure that the right keywords are assigned to such examples.
Oh yeah this is much better . we break the data into smaller filters and tag them with the keyword and join them back into one in the end...
Thank you so much for your help FBT.
Really appreciate it...
Arsalan
Hi,
You could also take a look at Metamap (https://metamap.nlm.nih.gov/), we used it before with RM.
Cheers
Sven