Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.

Question about parallel data cleansing

user194372user194372 Member Posts: 14 Contributor II

Hello, everyone.


I need following information for my project.


(Data cleansing includes handling missing values, outliers, error correction, scaling, binning, necessary transformations etc

 which is done before the main analysis)


My question is


Does Rapidminer support parallel data cleansing?


Also I want to know which operators and which parameters support parallel cleansing.


Thank you and have a nice day

Answers

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hi,

    RapidMiner executes process steps (operators) one by one sequentially, as it by default assumes that they rely on the previous results. 

    Some operators are internally parallelized if the algorithm is suitable for it. This is the case, for example, in Cross Validation or Random Forest. The preprocessing steps you described are usually not very suitable for parallelization.

    So the answer to your question is: RapidMiner supports it but only a few operators are actually implemented in a parallel way.

    If you have a lot of data and the different data cleansing steps don't rely on each other, you can use background execution in Studio or jobs on AI Hub to execute multiple processes at once.

    Regards,

    Balázs
  • user194372user194372 Member Posts: 14 Contributor II

    Dear Balazs

    Thank you for your nice explanation!

    It helped me a lot.

    May I ask a further question?

    How can I do background execution in Studio or in AI Hub to execute multiple processes at once

    if those cleansing steps don't reply on each other?


    Thank you and have a nice week.

  • BalazsBaranyBalazsBarany Administrator, Moderator, Employee, RapidMiner Certified Analyst, RapidMiner Certified Expert Posts: 955 Unicorn
    Hello!

    In RapidMiner Studio you simply open the process and instead of running it directly, you select "Run Process in Background". This will open the Background Monitor panel where you see the status of processes running in the background.

    For AI Hub, this video explains it:
    https://academy.rapidminer.com/learn/video/scaling-ai-hub-execution

    Regards,
    Balázs
Sign In or Register to comment.