One Problem

kinkouniokinkounio Member Posts: 9 Contributor II
edited November 2018 in Help
I have a file with more data and i compare to file with one data. The result will have one data of first file. The data more proxim to data of second file.

How to ??

Answers

  • IngoRMIngoRM Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, Community Manager, RMResearcher, Member, University Professor Posts: 1,597  RM Founder
    Hi,

    this question has been asked during the last few days a few times. Here are the answers:

    You have two options.

    1. Load the data sets and merge them. Calculate a similarity measure for the merged data set. Filter out the combinations where your single data is not part of. Sort the rest. Use the one with the highest similariy. All the necessary operators are part of RapidMiner.

    2. If the amount of data is rather large, then the calculation of the full similarity matrix is probably not applicable. In that case, you have to iterate over the examples, use only the current example, calculate the similarity with your single example of interest and store it via ProcessLog. Afterwards you can change the process log back to a data set, sort it etc.

    Cheers,
    Ingo
  • kinkouniokinkounio Member Posts: 9 Contributor II
    Good moorning .

    Where is the similar post?

    Thanks.
  • kinkouniokinkounio Member Posts: 9 Contributor II
    Hi.

    I want to compare 2 archives.

    historik.txt

    1 73 15 16 13 14 15
    2 123 25 26 23 24 25
    3 173 35 36 33 34 35
    4 224 45 46 43 44 46
    5 274 55 56 53 54 56

    dades.txt

    25 26 23 24 25

    The correct result would be the second row of the first file . Value: 123

    With this code he is not correct. The result with this code is 73. That I have bad?

    <operator name="Root" class="Process" expanded="yes">
        <parameter key="resultfile" value="/home/rm_workspace/p2/resultat.res"/>
        <operator name="InputHistorik" class="ExampleSource">
            <parameter key="attributes" value="/home/rm_workspace/p2/historik.aml"/>
        </operator>
        <operator name="FeatureRangeRemoval" class="FeatureRangeRemoval">
            <parameter key="first_attribute" value="1"/>
            <parameter key="last_attribute" value="1"/>
        </operator>
        <operator name="NearestNeighbors" class="NearestNeighbors">
        </operator>
        <operator name="Diari" class="ExampleSource">
            <parameter key="attributes" value="/home/rm_workspace/p2/dades.aml"/>
        </operator>
        <operator name="ModelApplier" class="ModelApplier">
            <list key="application_parameters">
            </list>
        </operator>
    </operator>

    Files aml.

    dades.aml
    <?xml version="1.0" encoding="UTF-8"?>
    <attributeset default_source="dades.dat">
      <attribute
        name      = "dades.txt (1)"
        sourcecol  = "1"
        valuetype  = "integer"/>

      <attribute
        name      = "dades.txt (2)"
        sourcecol  = "2"
        valuetype  = "integer"/>

      <attribute
        name      = "dades.txt (3)"
        sourcecol  = "3"
        valuetype  = "integer"/>

      <attribute
        name      = "dades.txt (4)"
        sourcecol  = "4"
        valuetype  = "integer"/>

      <attribute
        name      = "dades.txt (5)"
        sourcecol  = "5"
        valuetype  = "integer"/>

    </attributeset>

    historik.aml

    <?xml version="1.0" encoding="UTF-8"?>
    <attributeset default_source="historik.dat">
      <attribute
        name      = "historik.txt (1)"
        sourcecol  = "1"
        valuetype  = "integer"/>

      <label
        name      = "historik.txt (2)"
        sourcecol  = "2"
        valuetype  = "integer"/>

      <cluster
        name      = "historik.txt (3)"
        sourcecol  = "3"
        valuetype  = "integer"/>

      <attribute
        name      = "historik.txt (4)"
        sourcecol  = "4"
        valuetype  = "integer"/>

      <attribute
        name      = "historik.txt (5)"
        sourcecol  = "5"
        valuetype  = "integer"/>

      <attribute
        name      = "historik.txt (6)"
        sourcecol  = "6"
        valuetype  = "integer"/>

      <attribute
        name      = "historik.txt (7)"
        sourcecol  = "7"
        valuetype  = "integer"/>

    </attributeset>

    How I can do it?

    Thanks.
  • haddockhaddock Member Posts: 849  Guru
    Hi,

    The answer to your problem is that for some reason only known to yourself you call column three a cluster!

    <cluster
        name      = "historik.txt (3)"
        sourcecol  = "3"
        valuetype  = "integer"/>

    I've laid out the data in one file like this...

    1 73 15 16 13 14 15
    2 123 25 26 23 24 25
    3 173 35 36 33 34 35
    4 224 45 46 43 44 46
    5 274 55 56 53 54 56
    6 ?  25 26 23 24 25


    and made the necessary code changes to this...
    <operator name="Root" class="Process" expanded="yes">
        <parameter key="resultfile" value="/home/rm_workspace/p2/resultat.res"/>
        <operator name="InputHistorik" class="ExampleSource">
            <parameter key="attributes" value="C:\Program Files (x86)\Rapid-I\RapidMiner-4.3\historik"/>
        </operator>
        <operator name="NearestNeighbors" class="NearestNeighbors">
        </operator>
        <operator name="InputHistorik (2)" class="ExampleSource">
            <parameter key="attributes" value="C:\Program Files (x86)\Rapid-I\RapidMiner-4.3\historik"/>
        </operator>
        <operator name="ExampleFilter" class="ExampleFilter">
            <parameter key="condition_class" value="missing_labels"/>
        </operator>
        <operator name="ModelApplier" class="ModelApplier">
            <list key="application_parameters">
            </list>
        </operator>
    </operator>
    and rather unsurprisingly the correct answer emerges.

    So the answer to
    How I can do it?
    is

    With more care!
  • kinkouniokinkounio Member Posts: 9 Contributor II
    Hi, haddock.

    Your code it's not the solution. I woultd compare the atribute 3-7 of file 1 with atribute of file 2 and the result there is atribute 2 of file 1.

    The column "cluster" is an error for me.

    I would obtain one valor of the second column of file 1. This valor is the valor where the file 1 is the same valor of file 2.

    In the example my, on compare 2 files the result it would have to give the second colum of second row of file 1.

    Thanks.
  • haddockhaddock Member Posts: 849  Guru
    The correct result would be the second row of the first file . Value: 123
    To make it even easier for you to comprehend I've put the data into CSV form, then we don't need AML files at all. So here is the data...

    1, 73, 15, 16, 13, 14,15
    2, 123, 25, 26, 23,24, 25
    3, 173, 35, 36, 33, 34, 35
    4, 224, 45, 46, 43, 44, 46
    5, 274, 55, 56,53, 54, 56
    6,    , 25, 26, 23, 24, 25

    For the same reason I've taken out the second data read and replaced it with a datacopy, like this...
    <operator name="Root" class="Process" expanded="yes">
        <operator name="CSVExampleSource" class="CSVExampleSource" breakpoints="after">
            <parameter key="filename" value="C:\Users\CJFP\Documents\rm_workspace\historik.txt"/>
            <parameter key="read_attribute_names" value="false"/>
            <parameter key="label_column" value="2"/>
            <parameter key="id_column" value="1"/>
        </operator>
        <operator name="IOMultiplier" class="IOMultiplier">
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
        <operator name="ExampleFilter" class="ExampleFilter">
            <parameter key="condition_class" value="missing_labels"/>
            <parameter key="invert_filter" value="true"/>
        </operator>
        <operator name="NearestNeighbors" class="NearestNeighbors">
        </operator>
        <operator name="IOSelector" class="IOSelector">
            <parameter key="io_object" value="ExampleSet"/>
        </operator>
        <operator name="ExampleFilter (2)" class="ExampleFilter">
            <parameter key="condition_class" value="missing_labels"/>
        </operator>
        <operator name="ModelApplier" class="ModelApplier">
            <list key="application_parameters">
            </list>
        </operator>
    </operator>
    If I run this I get "123" as the answer, just like before, so I'm puzzled as to what you mean by the following
    Your code it's not the solution. I woultd compare the atribute 3-7 of file 1 with atribute of file 2 and the result there is atribute 2 of file 1.
    Perhaps you could enlighten us?
  • kinkouniokinkounio Member Posts: 9 Contributor II
    Hi,
    haddock  thanks.

    I will prove it.
Sign In or Register to comment.