Rapidminer does not have such capabilities at the moment. I've tried their various information extraction operators on text but its very basic. GATE, OpenNLP, Stanford NLP etc are some tools you can use to achieve this. Also if you're comfortable trying another analytics platform, KNIME has been able to integrate some good NLP tools such as NE taggers, text annotators, and other cool operators/nodes.
NameSor is a RapidMiner extension that's able to determine gender, ethnicity, and origin. Maybe that will help
@batstache611 @Thomas_Ott The Rosette text mining extension (third party but available from the marketplace) does have an operator for "extract entities", and it works with names as well as other entities. You will need to set up a free account with them to test it.
Thank you Brian,
I have already tried the features of Rosette's API from within RapidMiner and the results aren't very consistent. Entity extraction picks up garbage text as entities sometimes, sentiment analysis isn't any good at handling sarcasm or irony, etc. However, Rosette's biggest drawback is that it expects pre-processed input, i.e. the text has to be in cells in a data table, it cannot work with unstructured documents. I'm willing to understand that as well....
But when it throws me an error such as "Must contain meaningful text" even after I've brought the unstructured text data in to a table format, defined the column types in the Data Editor, and told each Rosette operator (tokenize, sentence extract, sentiment, entity extract, names, etc.) which column in the data table contains the text, that's when I start losing my faith in RM's text analytics capabilities.
RapidMiner should really make an effort to integrate native NLP tools based off of CoreNLP, GATE, OpenNLP, etc. that can do much more than what the standard Text Processing extension can do at the moment. I mean being a leader in Gartner's 2016 Magic Quadrant along with SAS and SPSS, one would naturally expect this out of RM as it grows. Thank you very much.