RapidMiner

Contributor I m_tarhsaz
Contributor I

Radoop Full Test (Spark job) test error , Hadoop 2.8 , Spark 2.1.1

I installed Hadoop 2.8 , Spark 2.1.1 Single node in VM

and Rapidminer 7.5.001 , Radoop 7.5

 

I selected "Apache Hadoop 2.2+" in Radoop Connection.

 

I validated Spark installation with SparkPi.

Quick Test finished successfully.

 

I got following error in yarn for Full Test(only Spark Job selected) :

 

17/06/29 02:42:36 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8030
17/06/29 02:42:36 INFO YarnRMClient: Registering the ApplicationMaster
17/06/29 02:42:36 INFO YarnAllocator: Will request 1 executor container(s), each with 1 core(s) and 2432 MB memory (including 384 MB of overhead)
17/06/29 02:42:36 INFO YarnAllocator: Submitted 1 unlocalized container requests.
17/06/29 02:42:36 INFO ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
17/06/29 02:42:37 INFO AMRMClientImpl: Received new token for : snd.hadoop.domain.com:33252
17/06/29 02:42:37 INFO YarnAllocator: Launching container container_1498686711343_0013_02_000002 on host snd.hadoop.domain.com
17/06/29 02:42:37 INFO YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them.
17/06/29 02:42:37 INFO ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
17/06/29 02:42:37 INFO ContainerManagementProtocolProxy: Opening proxy : snd.hadoop.domain.com:33252
17/06/29 02:42:46 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (192.168.0.14:47894) with ID 1
17/06/29 02:42:46 INFO YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
17/06/29 02:42:46 INFO YarnClusterScheduler: YarnClusterScheduler.postStartHook done
17/06/29 02:42:46 INFO BlockManagerMasterEndpoint: Registering block manager snd.hadoop.domain.com:38359 with 912.3 MB RAM, BlockManagerId(1, snd.hadoop.domain.com, 38359, None)
17/06/29 02:42:47 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 274.9 KB, free 366.0 MB)
17/06/29 02:42:48 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.9 KB, free 366.0 MB)
17/06/29 02:42:48 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.0.14:41278 (size: 22.9 KB, free: 366.3 MB)
17/06/29 02:42:48 INFO SparkContext: Created broadcast 0 from textFile at SparkTestCountJobRunner.java:43
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/tmp/radoop/root/tmp_1498687861970_0idqr77
    at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
    at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
    at org.apache.spark.rdd.RDD.count(RDD.scala:1157)
    at org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:455)
    at org.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:45)
    at eu.radoop.spark.SparkTestCountJobRunner.main(SparkTestCountJobRunner.java:45)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)

 

The error say "Input path does not exist: file:/tmp/radoop/root/tmp_1498687861970_0idqr77"

but I found that folder in HDFS , also it has a file with sample data .

 

Permission for that folder is "drwxrwxrwx".

 

Connection and logs are attached.

 

Any solutions ?

 

3 REPLIES
RM Staff
RM Staff

Re: Radoop Full Test (Spark job) test error , Hadoop 2.8 , Spark 2.1.1

Hi,

 

the path in the error: "file:/tmp/radoop/root/..." indicates that there is a configuration problem in the submitted Spark job. Meaning that it looks for the file on the local filesystem (on the particular node) instead of the HDFS. (The configuration makes HDFS accessible. The "Wrong FS: hdfs..." message in the stderr.html also indicates that.)

 

In the connection xml, the "yarn.application.classpath" setting is strange. It makes the classpath of the submitted job empty, so then the configuration on the cluster is not loaded in the job - this may be the cause behind the error. Disabling or removing it may make a difference.

 

Best,

Peter

 

 

Contributor I m_tarhsaz
Contributor I

Re: Radoop Full Test (Spark job) test error , Hadoop 2.8 , Spark 2.1.1

I added "yarn.application.classpath" with empty value according to your suggestion in the following topic becasue I had same problem.

Radoop connection error (Failed: fetching dynamic settings)

 

If I remove that old problem will be back.

RM Staff
RM Staff
Solution

Re: Radoop Full Test (Spark job) test error , Hadoop 2.8 , Spark 2.1.1

I see.

 

But I also wrote "If this does not work, the multi-line default value that is described in the link above for this property can be copy-pasted to the value cell instead." Smiley Happy What happens, if you set the value from here?

https://hadoop.apache.org/docs/r2.8.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

 

$HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/share/hadoop/common/*, $HADOOP_COMMON_HOME/share/hadoop/common/lib/*, $HADOOP_HDFS_HOME/share/hadoop/hdfs/*, $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*, $HADOOP_YARN_HOME/share/hadoop/yarn/*, $HADOOP_YARN_HOME/share/hadoop/yarn/lib/*

 

I wonder if these env variables work on the cluster or not....

 

Polls
How can RapidMiner increase participation in our new competitions?
Twitter Feed