Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"qualitativ measurement for text"
hi,
ive heard that one needs to use qualitative approaches for data analysis, when the objects variables are neither metric or nonmetric in the classification kind but of the arbitrary text type.
Can somebody tell me a few names of those approaches? Im also unfamiliar in the terminology of such methods. Are they really called qualitative measurement approaches? It seems a little bit too abstract and the source ive read concerning the topic is pretty vague, so i dont know if this is really what im looking for.
I would appreciate any help on the topic.
greetings
shai
ive heard that one needs to use qualitative approaches for data analysis, when the objects variables are neither metric or nonmetric in the classification kind but of the arbitrary text type.
Can somebody tell me a few names of those approaches? Im also unfamiliar in the terminology of such methods. Are they really called qualitative measurement approaches? It seems a little bit too abstract and the source ive read concerning the topic is pretty vague, so i dont know if this is really what im looking for.
I would appreciate any help on the topic.
greetings
shai
Tagged:
0
Answers
sorry, I must admit that I am not really sure what you are after. At least I would not say that "qualitative analysis" is a term I heard before although I could imagine what could be meant by that. But that is nothing uncommon: working in the field of data analysis for about 15 years now, I often have seen new terms and concepts (for the same old ideas) coming and going again. Often the terms also vary a lot depending on the community which applies data mining.
So let's try this one: I would think that the "qualitative" for text attributes could refer to the way data mining algorithms handle those attributes. As an example I would name a similarity measure for example for clustering schemes which takes the text into account. This can range from simple ones like just checking for equalness (like its already done by the mixed euclidean distance for example in RapidMiner) to edit distances which also take "creative variation" (aka spelling errors ) into account. A last idea could also be to use dictionaries like word net and determine if two terms are actually synonymes and hence could share a higher similarity. Although the last idea might sound nice, I would generally not recommend this approach. Instead, I would add this synonyme detection into the preprocessing and calculate term importance (e.g. TFIDF) afterwards. This would reflect much better the similarity of the complete texts. But it could be an option if the attributes do not contain really texts but only arbitrary single words.
Hope that helped,
Ingo
thx. this is exactly what i was asking about. Ive also thought about lexical databases like wordnet that take semantic similarities into account. If i may ask, do you know of such approaches? And why wouldn't you recommend this kind of approach?
greetings
shai
I did not say that I do not recommend this in general but that I would perform this disambiguation and term mapping during preprocessing and before a normalization step like calculating TFIDF. This simply would take the similarity of the complete text much better into account then.
Cheers,
Ingo