The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Radoop "Full Test" (Spark job) connection test error with Hadoop 2.8 and Saprk 2.1

m_tarhsazm_tarhsaz Member Posts: 6 Contributor II
edited November 2018 in Help

I installed hadoop 2.8 , Spark 2.1.0 binaries, Rapidminer 7.5.001 , and Radoop 7.5.0

 

The Hadoop version in connection is "Apache Hadoop 2.2+" . (connection xml is attached)

 

I validated spark installation with SparkPi.

 

Quick Test finished successfully , but in Spark Job test (Full Test) I got this error in yarn:

 

17/06/29 02:42:36 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8030
17/06/29 02:42:36 INFO YarnRMClient: Registering the ApplicationMaster
17/06/29 02:42:36 INFO YarnAllocator: Will request 1 executor container(s), each with 1 core(s) and 2432 MB memory (including 384 MB of overhead)
17/06/29 02:42:36 INFO YarnAllocator: Submitted 1 unlocalized container requests.
17/06/29 02:42:36 INFO ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
17/06/29 02:42:37 INFO AMRMClientImpl: Received new token for : snd.hadoop.domain.com:33252
17/06/29 02:42:37 INFO YarnAllocator: Launching container container_1498686711343_0013_02_000002 on host snd.hadoop.domain.com
17/06/29 02:42:37 INFO YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them.
17/06/29 02:42:37 INFO ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
17/06/29 02:42:37 INFO ContainerManagementProtocolProxy: Opening proxy : snd.hadoop.domain.com:33252
17/06/29 02:42:46 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (192.168.0.14:47894) with ID 1
17/06/29 02:42:46 INFO YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
17/06/29 02:42:46 INFO YarnClusterScheduler: YarnClusterScheduler.postStartHook done
17/06/29 02:42:46 INFO BlockManagerMasterEndpoint: Registering block manager snd.hadoop.domain.com:38359 with 912.3 MB RAM, BlockManagerId(1, snd.hadoop.domain.com, 38359, None)
17/06/29 02:42:47 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 274.9 KB, free 366.0 MB)
17/06/29 02:42:48 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.9 KB, free 366.0 MB)
17/06/29 02:42:48 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.0.14:41278 (size: 22.9 KB, free: 366.3 MB)
17/06/29 02:42:48 INFO SparkContext: Created broadcast 0 from textFile at SparkTestCountJobRunner.java:43
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/tmp/radoop/root/tmp_1498687861970_0idqr77
    at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
    at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
    at org.apache.spark.rdd.RDD.count(RDD.scala:1157)
    at org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:455)
    at org.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:45)
    at eu.radoop.spark.SparkTestCountJobRunner.main(SparkTestCountJobRunner.java:45)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)

 

The error say "org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/tmp/radoop/root/tmp_1498687861970_0idqr77" but I checked HDFS and found folder "/tmp/radoop/root/tmp_1498687861970_0idqr77" which contain a file with sample data about iris.

The permission for that folder is "drwxrwxrwx".

 

Yarn logs attched.

 

So what's the problem ?

 

 

 

 

 

Best Answer

Sign In or Register to comment.