How to delete single examples

colocolo Member Posts: 236 Maven
edited November 2018 in Help
Hello everybody,

after some time of developing I am facing a pretty simple problem, but don't know how to solve it. Maybe I am concerned too much with other problems to see a simple solution.
While iterating over all examples there appear some that should better be removed. Is there any chance to do this, or will I have to create a new example set and add just the nice examples?

Thanks and best regards
Matthias
Tagged:

Answers

  • wesselwessel Member Posts: 537 Maven
    Hey,

    There is the filter example range operator?
    Also there is the the filter examples operator? (e.g. remove all examples with id=40)

    Best regards,

    Wessel
  • colocolo Member Posts: 236 Maven
    Hi wessel,

    thanks for the hint. I already took a look at the code of both operators. But "Filter Example Range" is working with SplittedExampleSet, which does not make much sense in my case.
    "Filter Examples" uses ConditionedExampleSet instead, which is working by modifying the mapping somehow. But I can't figure out, where the private mapping array really is applied to reproduce this for my iteration. I would like to remove the example when it is evaluated as invalid during iteration. I might set a value like "remove" and use the filter operator to delete all examples containing this value after my iteration. But this isn't really nice and I thought there should be a possibilty to delete a single example directly without having to use some sort of filtering to re-detect it (since it is currently processed).

    Any further idea?

    Best regards
    Matthias
  • wesselwessel Member Posts: 537 Maven
    Hey,

    You can set the weight of this example to 0?

    Or you can do some bookkeeping yourself?

    Like in the code where you will use the data set do something like:
    for (Example e : exampleSet) {
    if(e["id"] == x) {
                break;
        }
    }

    Best regards,

    Wessel
  • wesselwessel Member Posts: 537 Maven
    Hey,

    Let's say you do normal java, and you have a double[][].
    There is not a really good way to delete an entry here either.
    Only when storing your data as a linked list you can efficiently delete stuff.

    So some extra bookkeeping is probably your best option.

    Best regards,

    Wessel
  • colocolo Member Posts: 236 Maven
    Hi wessel,

    thanks again for your answers.

    Of course deleting some entries somewhere in the middle of large arrays isn't very efficient. But I thought the example set data structure might provide some functionality to achieve this (and solve it more efficiently internally).

    Keeping track of the example ids from the examples I want to keep is quite easy. The class ConditionedExampleSet does this by filling an array with the valid ids and somehow uses this as a new mapping. I would like to do this in a similar way, but I have no idea how to apply such a new mapping. I couldn't figure it out, although looking at the javadoc and the code as carefully as possible (time constraints grow as my thesis is slowly approaching the deadline ;))

    Or did you have something else in mind when talking about bookkeeping?

    For now I am just setting a special string content, that is filtered out by "Filter Examples" afterwards. Not really nice, but has to do the job for now :(

    Best regards
    Matthias
  • marcin_blachnikmarcin_blachnik Member Posts: 61 Guru
    Please look at mblachnik.pl/rm_exampleset.zip where you can download two examplSet's suitable to your problem. One is called EditedExampleSet, and the second SelectedExamleSet. They are based on binary data indexing. There is also an example of usage.
  • Marco_BoeckMarco_Boeck Administrator, Moderator, Employee, Member, University Professor Posts: 1,993 RM Engineering
    Hi,

    there is currently no way in RapidMiner to really delete examples from an ExampleSet while you iterate over the set.
    ExampleSets provide merely a view on the data provided by the ExampleTable.
    Your options are either to create something suitable to your needs yourself like marcin.blachnik posted above, to create a new ExampleTable from your edited DataRows and then create a new ExampleSet on that table, or use something like filtering (as you did).

    Regards,
    Marco
Sign In or Register to comment.