Options

"qualitativ measurement for text"

shaihuludshaihulud Member Posts: 20 Contributor II
edited May 2019 in Help
hi,

ive heard that one needs to use qualitative approaches for data analysis, when the objects variables are neither metric or nonmetric in the classification kind but of the arbitrary text type.
Can somebody tell me a few names of those approaches? Im also unfamiliar in the terminology of such methods. Are they really called qualitative measurement approaches? It seems a little bit too abstract and the source ive read concerning the topic is pretty vague, so i dont know if this is really what im looking for.
I would appreciate any help on the topic.

greetings
shai

Answers

  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    sorry, I must admit that I am not really sure what you are after. At least I would not say that "qualitative analysis" is a term I heard before although I could imagine what could be meant by that. But that is nothing uncommon: working in the field of data analysis for about 15 years now, I often have seen new terms and concepts (for the same old ideas) coming and going again. Often the terms also vary a lot depending on the community which applies data mining.

    So let's try this one: I would think that the "qualitative" for text attributes could refer to the way data mining algorithms handle those attributes. As an example I would name a similarity measure for example for clustering schemes which takes the text into account. This can range from simple ones like just checking for equalness (like its already done by the mixed euclidean distance for example in RapidMiner) to edit distances which also take "creative variation" (aka spelling errors  ;) ) into account. A last idea could also be to use dictionaries like word net and determine if two terms are actually synonymes and hence could share a higher similarity. Although the last idea might sound nice, I would generally not recommend this approach. Instead, I would add this synonyme detection into the preprocessing and calculate term importance (e.g. TFIDF) afterwards. This would reflect much better the similarity of the complete texts. But it could be an option if the attributes do not contain really texts but only arbitrary single words.

    Hope that helped,
    Ingo
  • Options
    shaihuludshaihulud Member Posts: 20 Contributor II
    hi ingo,

    thx. this is exactly what i was asking about. Ive also thought about lexical databases like wordnet that take semantic similarities into account. If i may ask, do you know of such approaches? And why wouldn't you recommend this kind of approach?

    greetings
    shai
  • Options
    IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,751 RM Founder
    Hi,

    I did not say that I do not recommend this in general but that I would perform this disambiguation and term mapping during preprocessing and before a normalization step like calculating TFIDF. This simply would take the similarity of the complete text much better into account then.

    Cheers,
    Ingo
  • Options
    shaihuludshaihulud Member Posts: 20 Contributor II
    yes that makes sense. Thx for the discussion
Sign In or Register to comment.