The Altair Community is migrating to a new platform to provide a better experience for you. In preparation for the migration, the Altair Community is on read-only mode from October 28 - November 6, 2024. Technical support via cases will continue to work as is. For any urgent requests from Students/Faculty members, please submit the form linked here

Radoop Full Test (Spark job) test error , Hadoop 2.8 , Spark 2.1.1

m_tarhsazm_tarhsaz Member Posts: 6 Contributor II
edited November 2018 in Help

I installed Hadoop 2.8 , Spark 2.1.1 Single node in VM

and Rapidminer 7.5.001 , Radoop 7.5

 

I selected "Apache Hadoop 2.2+" in Radoop Connection.

 

I validated Spark installation with SparkPi.

Quick Test finished successfully.

 

I got following error in yarn for Full Test(only Spark Job selected) :

 

17/06/29 02:42:36 INFO RMProxy: Connecting to ResourceManager at /0.0.0.0:8030
17/06/29 02:42:36 INFO YarnRMClient: Registering the ApplicationMaster
17/06/29 02:42:36 INFO YarnAllocator: Will request 1 executor container(s), each with 1 core(s) and 2432 MB memory (including 384 MB of overhead)
17/06/29 02:42:36 INFO YarnAllocator: Submitted 1 unlocalized container requests.
17/06/29 02:42:36 INFO ApplicationMaster: Started progress reporter thread with (heartbeat : 3000, initial allocation : 200) intervals
17/06/29 02:42:37 INFO AMRMClientImpl: Received new token for : snd.hadoop.domain.com:33252
17/06/29 02:42:37 INFO YarnAllocator: Launching container container_1498686711343_0013_02_000002 on host snd.hadoop.domain.com
17/06/29 02:42:37 INFO YarnAllocator: Received 1 containers from YARN, launching executors on 1 of them.
17/06/29 02:42:37 INFO ContainerManagementProtocolProxy: yarn.client.max-cached-nodemanagers-proxies : 0
17/06/29 02:42:37 INFO ContainerManagementProtocolProxy: Opening proxy : snd.hadoop.domain.com:33252
17/06/29 02:42:46 INFO YarnSchedulerBackend$YarnDriverEndpoint: Registered executor NettyRpcEndpointRef(null) (192.168.0.14:47894) with ID 1
17/06/29 02:42:46 INFO YarnClusterSchedulerBackend: SchedulerBackend is ready for scheduling beginning after reached minRegisteredResourcesRatio: 0.8
17/06/29 02:42:46 INFO YarnClusterScheduler: YarnClusterScheduler.postStartHook done
17/06/29 02:42:46 INFO BlockManagerMasterEndpoint: Registering block manager snd.hadoop.domain.com:38359 with 912.3 MB RAM, BlockManagerId(1, snd.hadoop.domain.com, 38359, None)
17/06/29 02:42:47 INFO MemoryStore: Block broadcast_0 stored as values in memory (estimated size 274.9 KB, free 366.0 MB)
17/06/29 02:42:48 INFO MemoryStore: Block broadcast_0_piece0 stored as bytes in memory (estimated size 22.9 KB, free 366.0 MB)
17/06/29 02:42:48 INFO BlockManagerInfo: Added broadcast_0_piece0 in memory on 192.168.0.14:41278 (size: 22.9 KB, free: 366.3 MB)
17/06/29 02:42:48 INFO SparkContext: Created broadcast 0 from textFile at SparkTestCountJobRunner.java:43
org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: file:/tmp/radoop/root/tmp_1498687861970_0idqr77
    at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
    at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
    at org.apache.spark.rdd.HadoopRDD.getPartitions(HadoopRDD.scala:202)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
    at org.apache.spark.rdd.MapPartitionsRDD.getPartitions(MapPartitionsRDD.scala:35)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:252)
    at org.apache.spark.rdd.RDD$$anonfun$partitions$2.apply(RDD.scala:250)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.rdd.RDD.partitions(RDD.scala:250)
    at org.apache.spark.SparkContext.runJob(SparkContext.scala:1958)
    at org.apache.spark.rdd.RDD.count(RDD.scala:1157)
    at org.apache.spark.api.java.JavaRDDLike$class.count(JavaRDDLike.scala:455)
    at org.apache.spark.api.java.AbstractJavaRDDLike.count(JavaRDDLike.scala:45)
    at eu.radoop.spark.SparkTestCountJobRunner.main(SparkTestCountJobRunner.java:45)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:637)

 

The error say "Input path does not exist: file:/tmp/radoop/root/tmp_1498687861970_0idqr77"

but I found that folder in HDFS , also it has a file with sample data .

 

Permission for that folder is "drwxrwxrwx".

 

Connection and logs are attached.

 

Any solutions ?

 

Best Answer

  • phellingerphellinger Employee-RapidMiner, Member Posts: 103 RM Engineering
    Solution Accepted

    I see.

     

    But I also wrote "If this does not work, the multi-line default value that is described in the link above for this property can be copy-pasted to the value cell instead." :) What happens, if you set the value from here?

    https://hadoop.apache.org/docs/r2.8.0/hadoop-yarn/hadoop-yarn-common/yarn-default.xml

     

    $HADOOP_CONF_DIR, $HADOOP_COMMON_HOME/share/hadoop/common/*, $HADOOP_COMMON_HOME/share/hadoop/common/lib/*, $HADOOP_HDFS_HOME/share/hadoop/hdfs/*, $HADOOP_HDFS_HOME/share/hadoop/hdfs/lib/*, $HADOOP_YARN_HOME/share/hadoop/yarn/*, $HADOOP_YARN_HOME/share/hadoop/yarn/lib/*

     

    I wonder if these env variables work on the cluster or not....

     

Answers

  • phellingerphellinger Employee-RapidMiner, Member Posts: 103 RM Engineering

    Hi,

     

    the path in the error: "file:/tmp/radoop/root/..." indicates that there is a configuration problem in the submitted Spark job. Meaning that it looks for the file on the local filesystem (on the particular node) instead of the HDFS. (The configuration makes HDFS accessible. The "Wrong FS: hdfs..." message in the stderr.html also indicates that.)

     

    In the connection xml, the "yarn.application.classpath" setting is strange. It makes the classpath of the submitted job empty, so then the configuration on the cluster is not loaded in the job - this may be the cause behind the error. Disabling or removing it may make a difference.

     

    Best,

    Peter

     

     

  • m_tarhsazm_tarhsaz Member Posts: 6 Contributor II

    I added "yarn.application.classpath" with empty value according to your suggestion in the following topic becasue I had same problem.

    Radoop connection error (Failed: fetching dynamic settings)

     

    If I remove that old problem will be back.

Sign In or Register to comment.