Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

"Medical Dictionary"

LorenzoLorenzo Member Posts: 7 Contributor II
edited May 2019 in Help
Hi!
I'm a newbie in rapidMiner and in the world of mining in general. I'm working with medical and scientific texts and my goal now is to pre-process them in a way that is suitable for clustering.
Ideally I want to use a medical/scientific dictionary to help me in the stemming and pre-processing phase, but I don't really now where to search..
I really hope that someone is able to answer these questions:
- which is the content (and the format) of a dictionary to be used in RapidMiner?
- are there some medical/scientific dictionaries available on the web? Where can I find them?
- If the previous answer is no..where I can find dictionaries (non scientific ones I mean) on the web?
Thanks for you attention!
Lorenzo

Answers

  • IngoRMIngoRM Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi Lorenzo,

    there is a simple dictionary feature available in RapidMiner which at leasts supports word replacements including those using regular expressions. The used operator is called "DictionaryStemmer". A dictionary in RapidMiner is a file containing matching rules, i.e. each line contains matching rules for a given entity. A rule is either a term or a regular expression, for example:

    weekday:(.*)day
    car_manufacturer: bmw chrysler ford toyota

    I don't know any medical dictionaries but that does not mean too much  ;) A general dictionary is WordNet but actually we did not notice too much success with using a dictionary for text classification / clustering so we usually avoid the additional work.

    Cheers,
    Ingo
  • LorenzoLorenzo Member Posts: 7 Contributor II
    Tnks Ingo for your two posts.
    Your immediate answer gave me a direction where to go and made me save a lot of time.
    Really thnks.
Sign In or Register to comment.