The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here
Answers
Rapidminer does not have such capabilities at the moment. I've tried their various information extraction operators on text but its very basic. GATE, OpenNLP, Stanford NLP etc are some tools you can use to achieve this. Also if you're comfortable trying another analytics platform, KNIME has been able to integrate some good NLP tools such as NE taggers, text annotators, and other cool operators/nodes.
NameSor is a RapidMiner extension that's able to determine gender, ethnicity, and origin. Maybe that will help
https://marketplace.rapidminer.com/UpdateServer/faces/product_details.xhtml?productId=rmx_namsor
@batstache611 @Thomas_Ott The Rosette text mining extension (third party but available from the marketplace) does have an operator for "extract entities", and it works with names as well as other entities. You will need to set up a free account with them to test it.
Lindon Ventures
Data Science Consulting from Certified RapidMiner Experts
Yep, there's Rosette too. I haven't spent a lot of time with it but it looks really cool.
NamSor allows you to extract, gender, and ethnicity information about a name record, it doesn't necessarily help you identify and tag an entity (poeple, place, organization, etc.) in a body of unstructured text.
Thank you Brian,
I have already tried the features of Rosette's API from within RapidMiner and the results aren't very consistent. Entity extraction picks up garbage text as entities sometimes, sentiment analysis isn't any good at handling sarcasm or irony, etc. However, Rosette's biggest drawback is that it expects pre-processed input, i.e. the text has to be in cells in a data table, it cannot work with unstructured documents. I'm willing to understand that as well....
But when it throws me an error such as "Must contain meaningful text" even after I've brought the unstructured text data in to a table format, defined the column types in the Data Editor, and told each Rosette operator (tokenize, sentence extract, sentiment, entity extract, names, etc.) which column in the data table contains the text, that's when I start losing my faith in RM's text analytics capabilities.
RapidMiner should really make an effort to integrate native NLP tools based off of CoreNLP, GATE, OpenNLP, etc. that can do much more than what the standard Text Processing extension can do at the moment. I mean being a leader in Gartner's 2016 Magic Quadrant along with SAS and SPSS, one would naturally expect this out of RM as it grows. Thank you very much.
Using the Entity Extraction and Concept Extraction features in the AYLIEN Text Analysis Extension you can extract names from unstructured text.
You can download the extension here.
You can get your free AYLIEN API key here and here is a quick guide on getting started.