Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
how to combine 2 example sets where both have some missing values?
This has been frustrating the !!!$@#Q$ OUT OF ME!!!
2 different sets, similar attributes, IDs match up fine.. so why isn't there an option to combine them and use the example from whichever one isn't missing?!??!
-
Join left? NOPE!
-
Join right? NOPE!
-
Join outer? NOPE!
-
Append? NOPE
-
Union? NOPE!
-
Superset? NOPE!
What am I missing?!?! HELP!
the datasets look like this>
A B C
1 x y ?
2 x y ?
3 ? z z
Tagged:
0
Answers
Hello
Could you provide the 2 example sets that are input and also what you need as the output. Currently, I'm only seeing one example set as input
regads
Andrew
As @Andrew said, a complete example makes it easier for community members to help you. Having said that, I believe the building block union append that another user created could help to solve your issue.
Still not clear - if you could include the 2 example sets you have as input and a manually created example of the desired output you will find that you will get more rapid and focussed help.
Andrew
For example, I run the same dataset thru an APIs, but because I run out of API calls some examples are missing. So I run the data thru a different API while I wait. Which also runs out and leaves some missing fields.
Then I run the APIs again later, or run a 3rd API, and end up with a bunch of different files with almost all the same info but some missing examples and no easy way to just 'merge' the values that aren't missing when i'm trying to combine them all!!! (and I can't simply remove the attribute because some of the rows DO have data).
Union Append is "nice", but it creates a similar problem where I have 2 rows, sometimes with exactly the same data! Which makes processing hard...
Hope somebody can gimme advice!
For example, I run the same dataset thru an APIs, but because I run out of API calls some examples are missing. So I run the data thru a different API while I wait. Which also runs out and leaves some missing fields.
Then I run the APIs again later, or run a 3rd API, and end up with a bunch of different files with almost all the same info but some missing examples and no easy way to just 'merge' the values that aren't missing when i'm trying to combine them all!!! (and I can't simply remove the attribute because some of the fields DO have data).
Union Append is "nice", but it creates a similar problem where I have 2 rows, sometimes with exactly the same data! Which makes processing hard...
Hope somebody can gimme advice!
good morning @781194025 - ok I see what you are asking and yes, "Union Append" will not help unless the attributes are the same name and type. You will need to rename your attributes such as "polarity_header" and "polarity" so that they are both the same before Union Append. And they will need to also be the same type (i.e. both nominal). IMO the operators are not allowing you to do what you want for a good reason - you would never want data science software to treat two attributes from different example sets as the same without the user ensuring that they are indeed the same.
Scott
I just want to join 2 example sets by ID and have the missing field come from whichever set has the information!!
I have a first table : Computer, Info 1, Info 2
and a second table : Computer, Info 3, Info 4
I need to merge all the data in a same table to obtain :
Computer, Info 1, Info 2, Info 3, Info 4
Example : In the first table we have
S64xxx, A1, A2,
S64xxx B1, B2
S64xxx,C1, C2
In the second table we have :
S64xxx, D1, D2
S64xxx, E1,E2
I want to have this result :
S64xxx, A1, A2,D1,D2
S64xxx B1, B2,E1,E2
S64xxx,C1, C2,?,?
Thanks for help
Just try to help.
The information you provide is not consisten.
IN the example you above you don't have a unique key (element Computer, data S64xxx) for the entries to match. According what you show the 'unique key' would be the position of the record.
On the other hand in the previous statements you talk about runnig the same dataset against various API's which in turn returned different values. Now yould like to combine alls of them back into a simple set again. So I assume that Info1 and Info 3 resp Info 2 and Info4 are the same (have the same purpose)
So the first point would be to clearify if your data has a unique key.
If you have a uniqe key try the following:
(Before you talked about another API, so possibly repeat steps 2-6)
In case you dont have a unique key
(or something that you can use as this) you need to check if the sequence is the same (and ensure this) then possibly you have another option. Use the 'ID' /'Row No.' as the unique key. But be 1000% sure the sets are samq sequenced.
Otherwise I might become very tricky.
Hope that helps.
hello @781194025 - so as you can see, several people including myself are trying to help you as best as we can, but clearly there is a disconnect between our help and you solving your problem. I must say that I cannot help further without really seeing 1) your XML process, and 2) your dataset. This is the norm here on the community. If you could please post them in this thread (rather than us going to some random link to download, etc...), I'd be happy to try to help further.
Scott
@781194025
Okay, I see you've spread yourself around the forum quite a bit, but this thread seems to be the main one.
Can I just summarize your problem a little more clearly based on my understanding?
If this is correct then the following would do it:
The Union Append is probably not necessary, but it seems you're really keen to use it so let's keep it in. :smileylol:
I'm not going to give you a full walkthrough here as the forum seems to have become quite clogged. But feel free to PM me and we can chat further if you're struggling.