Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
Best practice adding a new record(s) to example set.
JEdward
RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 578 Unicorn
Hello,
I'm going to feel pretty silly asking this, but what is the best practice for adding a record to an example set & setting the ID field to the next in the series?
If I want to add "James" to this table with the ID value of 5 what would be the best practice using RapidMiner for adding the next record in the table.
In SQL I'd have the ID column set to autonumber when I add new entries, but I'm looking to do this only using RapidMiner because otherwise I would lose the metadata. If I aggregate & look for the maxvalue in the table I could then use this value + 1, but I'm not sure if this is the best way of doing this, it seems an extra step somehow. I suppose it would also be possible to use the SQL autonumber tables by having a pseudotable that just holds ID numbers in it, but this seems needlessly excessive (although for millions of records could have some speed advantages).
Any opinions?
I'm going to feel pretty silly asking this, but what is the best practice for adding a record to an example set & setting the ID field to the next in the series?
ID | Names |
1 | Jack |
2 | Lucy |
3 | Simon |
4 | George |
In SQL I'd have the ID column set to autonumber when I add new entries, but I'm looking to do this only using RapidMiner because otherwise I would lose the metadata. If I aggregate & look for the maxvalue in the table I could then use this value + 1, but I'm not sure if this is the best way of doing this, it seems an extra step somehow. I suppose it would also be possible to use the SQL autonumber tables by having a pseudotable that just holds ID numbers in it, but this seems needlessly excessive (although for millions of records could have some speed advantages).
Any opinions?
0
Answers
I've done something like this before. I've included an example. Basically, it uses Append to join the example sets together and Extract Macro to work out how many examples are in the new example set. The Set Data operator then allows the id of the last example to be explicity set to the next value in the sequence. I don't know for certain but this last operator is probably quite slow so care would be needed for large volumes of data.
If this is a problem or if you wanted to append multiple new examples, you could modify the process so that the generated id started at an offset calculated from the number of examples in the original plus 1. The Set Data would not be needed in this case.
I'll leave that as an exercise for the reader
regards
Andrew
I'm reticent to use append as it states explicitly that it is not suited for merging large example sets. My own way of doing it currently is the below. It takes a name, checks that it does not already exist and if it doesn't creates it in the data table returning the record from the data table (with ID) that it finds.
It's a little complicated ::), what do you & the community think?
It does more than my example because I always assume the new record is to be added. So this makes it more complex - it also contains an operator I don't have so I can't run it.
However I can see a cunning use of Joins to find if the new record already exists and if not creating it with the new id. The subsequent outer Join may build everything in memory so we would need to do an experiment to see if it outperforms Append.
regards
Andrew