"How to manage resources on a Hadoop cluster when we are working on it through RapidMiner Radoop"
Submitting and managing Hadoop jobs via Radoop is enabled through Radoop Advanced connection properties:
1. The queue to which a map reduce job is submitted
Advanced Hadoop Parameters:
key is "mapreduce.job.queuename"
This must match one of the queues defined in mapred-queues.xml for the system. Before specifying a queue, ensure that the system is configured with the queue, and access is allowed for submitting jobs to the queue
2.YARN resource queue, that Client uses to submit a Spark application to.
Advanced Spark Parameters:
key is "spark.yarn.queue"
Advanced Hive Parameters Hive on Tez
Hive on Tez
key is hive.server2.tez.default.queues
Hive on MapReduce or Spark
key is mapreduce.job.queuename
hive number of sessions
key is hive.server2.tez.sessions.per.default.queue
4. For Spark:
The default setting for “Spark Resource Allocation Policy” in the Radoop connection is: “Static, Heuristic Configuration”
Recommended: “Dynamic Resource Allocation”. Then a job is started with few and small containers, then scale if required.
Radoop Advanced connection Properties5. SparkRM:
SparkRM operators have custom settings as operator parameters, that can be changed according to the subprocess’ requirement - “cluster resources %” and “executor memory %” can be tuned