RapidMiner

0 Likes

Broken backwards compatibility for Join operator on 8.2

Status: Needs Info

We recently discovered a change in functionality of the "Join" operator. In 8.2, joining without a key suddenly throws a user error. In the past we used this functionality to generate cartesian products of example sets (e.g. joining meta data to all examples). Since there is no compatibility level added to the operator, all old processes using this now crash, which is a desaster in deployed scenarios.

 

A compatibility level would be highly appreciated if some fundamental behaviour of any operator is removed or changed.

 

Fun fact: the "Join" operator is still tagged "cartesian" and shows up before "Cartesian Product" in the operator list.

3 Comments (3 New)
Comments
RM Staff

Hi ,

 

you are right that the Join operator should no longer be tagged as Cartesian since empty key attributes are not supported. We will delete this tag for the upcoming 9.0 release.

 

We were not able to reproduce your backwards compatibility problem. Can you please attach a process where Join works without key attributes before 8.2 and not after?

 

Just out of curiosity: Why do you prefer using the Join operator for a cartesian product when the specialised Cartesian Product operator is much faster? Is there a problem with the Cartesian Product operator?

 

Community Manager
Status: Needs Info
 
RM Partner

Hey @gmeier,

thanks for the feedback. Since switching between 7.x and 8.2 always gets my rapidminer account blocked (same problem as described here), I only investigated further today.

 

It seems I missed the changed operater class of the new Join operator. My example process with the old Join operator indeed still works in 8.2, with the operator itself marked as deprecated. This invalidates my main concern, so sorry for the fuss.

 

Still, I would really like to be able to use the new Join the way I always did. Having worked a lot with databases, the operation I tried to do was always more of a full outer join, than the mathematical cartesian (or cross join). It also resonated with the rapidminer way of design, to "add" information to a dataset: add specialized information with a "join with id", add general information with a "join without id". Using a different operator would break this mental model for me. For the same reasons I never tried the cartesian operator, so I cannot comment on different speeds.