Superset Operator Tips

JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 563   Unicorn
edited November 2018 in Help
Hi all,

I have a process where I'm using the Superset operator to add attributes in my dataset.  (Around 25,000 empty attributes each time) 
However, the operator is quite a bottleneck in the process. 

Does anyone have any best practices for adding large numbers of attributes to a dataset efficiently? 

Thanks,
JEdward

Answers

  • DocMusherDocMusher Member Posts: 242   Unicorn
    Hi,
    Have you tried to use the Radoop extension with Hive for that? The process is not exactly the same but the speed Radoop provides gave me at least the feeling I did not have to wait longer than I normally can handle... There is no need to build your entire process externally, I think. Ideally would be to have these processes taking a lot of time, externally running and the rest on you local machine.
    Cheers
    Sven
  • mschmitzmschmitz Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert, University Professor Posts: 2,046  RM Data Scientist
    Hi,

    have you tried to add a materialize data? Might be a bit faster afterwards.

    Cheers,
    Martin
    - Head of Data Science Services at RapidMiner -
    Dortmund, Germany
  • JEdwardJEdward RapidMiner Certified Analyst, RapidMiner Certified Expert, Member Posts: 563   Unicorn
    Thanks guys,

    Martin, good point on materialise data.  I'll give that a try. 

    Sven, I will be moving this project onto Hadoop eventually, but for now I'm stuck in RDBMS land with this one. 
Sign In or Register to comment.