Options

"Medical Dictionary"

LorenzoLorenzo Member Posts: 7 Contributor II
edited May 2019 in Help
Hi!
I'm a newbie in rapidMiner and in the world of mining in general. I'm working with medical and scientific texts and my goal now is to pre-process them in a way that is suitable for clustering.
Ideally I want to use a medical/scientific dictionary to help me in the stemming and pre-processing phase, but I don't really now where to search..
I really hope that someone is able to answer these questions:
- which is the content (and the format) of a dictionary to be used in RapidMiner?
- are there some medical/scientific dictionaries available on the web? Where can I find them?
- If the previous answer is no..where I can find dictionaries (non scientific ones I mean) on the web?
Thanks for you attention!
Lorenzo

Answers

  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi Lorenzo,

    there is a simple dictionary feature available in RapidMiner which at leasts supports word replacements including those using regular expressions. The used operator is called "DictionaryStemmer". A dictionary in RapidMiner is a file containing matching rules, i.e. each line contains matching rules for a given entity. A rule is either a term or a regular expression, for example:

    weekday:(.*)day
    car_manufacturer: bmw chrysler ford toyota

    I don't know any medical dictionaries but that does not mean too much  ;) A general dictionary is WordNet but actually we did not notice too much success with using a dictionary for text classification / clustering so we usually avoid the additional work.

    Cheers,
    Ingo
  • Options
    LorenzoLorenzo Member Posts: 7 Contributor II
    Tnks Ingo for your two posts.
    Your immediate answer gave me a direction where to go and made me save a lot of time.
    Really thnks.
Sign In or Register to comment.