Due to recent updates, all users are required to create an Altair One account to login to the RapidMiner community. Click the Register button to create your account using the same email that you have previously used to login to the RapidMiner community. This will ensure that any previously created content will be synced to your Altair One account. Once you login, you will be asked to provide a username that identifies you to other Community users. Email us at Community with questions.
"Classloader problem integrating Hadoop to Rapidminer"
Dear Gentlemen,
First of all, many thanks for your amazing job with Rapidminer. As everybody tells, the Rapidminer team is composed of super heroes.
I'm creating an extension in order to integrate Rapidminer to Haddop, Mahout, Hive and so on, and I'm getting the following exception when I try to submmit a job:
Find bellow my Operator.doWork() code:
Do you have some idea how to fix it?
Thanks in advance,
First of all, many thanks for your amazing job with Rapidminer. As everybody tells, the Rapidminer team is composed of super heroes.
I'm creating an extension in order to integrate Rapidminer to Haddop, Mahout, Hive and so on, and I'm getting the following exception when I try to submmit a job:
But in fact,the class org.apache.hadoop.security.UserGroupInformation$HadoopLoginModule is inside the extension jar, togheter with other dependencies that runs fine with a public static void main code.
java.lang.RuntimeException: java.io.IOException: failure to login
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:546)
at org.apache.hadoop.mapred.FileInputFormat.setInputPaths(FileInputFormat.java:318)
at com.rapidminer.operator.rmahout.clustering.KMeans.doWork(KMeans.java:116)
at com.rapidminer.operator.Operator.execute(Operator.java:834)
at com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
at com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:711)
at com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:379)
at com.rapidminer.operator.rmahout.configuration.MastersNode.doWork(MastersNode.java:51)
at com.rapidminer.operator.Operator.execute(Operator.java:834)
at com.rapidminer.operator.execution.SimpleUnitExecutor.execute(SimpleUnitExecutor.java:51)
at com.rapidminer.operator.ExecutionUnit.execute(ExecutionUnit.java:711)
at com.rapidminer.operator.OperatorChain.doWork(OperatorChain.java:379)
at com.rapidminer.operator.Operator.execute(Operator.java:834)
at com.rapidminer.Process.run(Process.java:925)
at com.rapidminer.Process.run(Process.java:848)
at com.rapidminer.Process.run(Process.java:807)
at com.rapidminer.Process.run(Process.java:802)
at com.rapidminer.Process.run(Process.java:792)
at com.rapidminer.gui.ProcessThread.run(ProcessThread.java:63)
Caused by: java.io.IOException: failure to login
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:490)
at org.apache.hadoop.security.UserGroupInformation.getCurrentUser(UserGroupInformation.java:452)
at org.apache.hadoop.fs.FileSystem$Cache$Key.<init>(FileSystem.java:1494)
at org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1395)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:254)
at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:123)
at org.apache.hadoop.mapred.JobConf.getWorkingDirectory(JobConf.java:542)
... 18 more
Caused by: javax.security.auth.login.LoginException: unable to find LoginModule class: org.apache.hadoop.security.UserGroupInformati
on$HadoopLoginModule
at javax.security.auth.login.LoginContext.invoke(Unknown Source)
at javax.security.auth.login.LoginContext.access$000(Unknown Source)
at javax.security.auth.login.LoginContext$5.run(Unknown Source)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.login.LoginContext.invokeCreatorPriv(Unknown Source)
at javax.security.auth.login.LoginContext.login(Unknown Source)
at org.apache.hadoop.security.UserGroupInformation.getLoginUser(UserGroupInformation.java:471)
Find bellow my Operator.doWork() code:
As I could check, this is a clas loader problem. Even if I put the dependencies inside Rapidminer\lib directory, the things go wrong.
public void doWork() throws OperatorException {
... Configuration config = new Configuration();
config.set("fs.default.name", "hdfs://" + host + ":"+ hdfsPort);
config.set("mapred.job.tracker",host+":" + mapredPort);
JobConf job = new JobConf(config);
job.setJarByClass(org.apache.mahout.clustering.syntheticcontrol.kmeans.Job.class);
job.setJobName("K-Means");
FileInputFormat.setInputPaths(job, new Path("/user/beckmann/testdata"));
FileOutputFormat.setOutputPath(job, new Path("b"));
JobClient.runJob(job);
}
Do you have some idea how to fix it?
Thanks in advance,
0
Answers
I figured out this problem is not related to classloading, and in fact this is not a rapidminer problem.
The problem lies on a reported bug (https://issues.apache.org/jira/browse/HADOOP-7982), that was fixed in hadoop 1.1.2.
When I moved from hadoop 1.0.4 to 1.1.2, the problem desapeared, and my work is going ahead.
I'll let you known when everything be done,
Best regards,
The work with the "Rapidminer Hadoop extension" is going ahead and for sure will be a 100% open source extension, like the other Hadoop related components did before.
Unfortunatelly not in time for RCOMM 2013.
Just to let you know and to avoid pitfalls, the Hadoop related components have several security constraints,
and some class definitions and security contexts must be in the main class loader, not in the plugin classloader, otherwise we'll face strange behaviors
during plugin execution.
To workarround this, for a while I put all Hadoop's dependent jar inside the rapidminer.jar (like other components did),
but will be great to do this in the right way, and to avoid to create a "proprietary" rapiminer.jar.
Does someone know how to do this?
Best regards,