The RapidMiner community is on read-only mode until further notice. Technical support via cases will continue to work as is. For any urgent licensing related requests from Students/Faculty members, please use the Altair academic forum here.
"distance measures of text attributes"
Hi
ive read that the distance measure procedure of the most clusteranalysis algorithm merely looks if the various text attributes of two objects a and b are the same. In other words it measures how many text attributes have the same value. Do they not take string measurements into account? For example: if object a has an attribute x with the value car and object b has the attribute x with the value cars, are they evaluated as a fit?
Btw.: am i right in this section for those kind of questions?
thx for the help.
ive read that the distance measure procedure of the most clusteranalysis algorithm merely looks if the various text attributes of two objects a and b are the same. In other words it measures how many text attributes have the same value. Do they not take string measurements into account? For example: if object a has an attribute x with the value car and object b has the attribute x with the value cars, are they evaluated as a fit?
Btw.: am i right in this section for those kind of questions?
thx for the help.
Tagged:
0
Answers
i would really love to read some answers to my question .. furthermore i would like to know if anybody knows if there are distance measure approaches for cluster analysis that take semantics into account. for example an attribute value 'car' will be matched on an attribute calue 'automobile'.
Guys i would really appreciate any help you can give me on this distance measurement topics.
greez
Then, you would calculate the distance between documents, based on their TF-IDF term scores, generally using the cosine similarity measure.
But, if you're trying to calculate the distance between terms, and not documents, then I would look into the Levenshtein Edit Distance, which I believe, is not (yet) implemented in RapidMiner.