"how to calculate the distance between two text documents?"

gfyanggfyang Member Posts: 29 Maven
edited June 2019 in Help
Hi,

Suppose here are two text documents, d1 and d2. I could build two vectors and read them by Iterator<Example>. Then, how to calculate the distance or similarity between them? For example, the cosine distance. Is there any operator or function provided by RM?

Thank you very much.

Sincerely yours,
gfyang

Answers

  • fischerfischer Member Posts: 439 Maven
    Hi,

    yes. Please look into com.rapidminer.tools.math.similarity.DistanceMeasure

    Cheers,
    Simon
  • gfyanggfyang Member Posts: 29 Maven
    Hi,

    Thanks a lot for the reply. However, it is still not clear enough for me. Would you please give some Java codes?
    I tried the following, but failed:

    ExampleSet ex=...
    Example ex1 = ex.getExample(1);
    Example ex2 = ex.getExample(2);
    DistanceMeasure myDis = new DistanceMeasure();
    double dis = myDis.calculateDistance(ex1, ex2);
    It reported DistanceMeasure could not be instantiated?

    Thank you.

    Sincerely yours,
    gfyang
  • fischerfischer Member Posts: 439 Maven
    Hi,

    distance Measure is abstract. You can only instantiate its subclasses.

    Also, if you are using a distance measure at an operator, try installing a DistanceMeasureHelper.

    Cheers,
    Simon
  • gfyanggfyang Member Posts: 29 Maven
    Hi,

    I see. The subclasses work well. Thank you.

    Sincerely yours,
    gfyang
Sign In or Register to comment.