Questions tagged [yarn]

0

votes
0

answer
2

Views

Need to provide password for yarn add jest

I'm trying to yarn add jest to the work repo I cloned onto my machine. C:\COMPANY_NAME\Work-Folder\frontend>yarn add jest yarn add v1.13.0 [1/4] Resolving packages... [2/4] Fetching packages... error Command failed. Exit code: 128 Command: git Arguments: ls-remote --tags --heads [email protected]:COMP...
J.Ko
0

votes
3

answer
15

Views

Where does log4j writes the logs in cluster mode?

Purpose - Store custom log from streaming app to HDFS or UNIX directory for streaming application I am running spark streaming program in cluster mode.But logs are not getting written to given log path. checked both HDFS and Local directory.By log4j debug property i can see files in action. Am i mis...
Elvish_Blade
1

votes
1

answer
729

Views

Header information of `yarn top` command

The usual top command on UNIX and Mac OS, was extended to hadoop in its recent versions, some information about it is given here. It has the following headers, APPLICATIONID USER TYPE QUEUE #CONT #RCONT VCORES RVCORES MEM RMEM VCORESECS MEMSECS %PROGR TIME NAME I was wondering what #RCCONT RVCORES a...
Chitral Verma
0

votes
0

answer
4

Views

typscript compiler (tsc) throws error, but only when used in yarn script, otherwise works fine

So if I execute: tsc Everything compiles, no issues. Picks up patterns in tsconfig.json # inside package.json scripts: { "build" : "tsc" } $ yarn build Throws error. error TS6307: File '***' is not in project file list. Projects must list all files or use an 'include' pattern. error Command faile...
Sandwich
1

votes
1

answer
874

Views

Apache Spark: Yarn logs Analysis

I am having a spark-streaming application, and I want to analyse the logs of the job using Elasticsearch-Kibana. My job is run on yarn cluster, so the logs are getting written to HDFS as I have set yarn.log-aggregation-enable to true. But, when I try to do this : hadoop fs -cat ${yarn.nodemanager.r...
void
1

votes
1

answer
572

Views

YARN MapReduce has not enough RAM

I am trying to launch my application in Yarn MapReduce. I have set MapReduce in 4 hosts (1 resource manager and 4 node managers). Each host has 2 cores and 4GB of RAM. When I run my application, it aborts because of lack of RAM [1]. How should I set Yarn MapReduce so that jobs won't run out of RAM?...
xeon123
0

votes
0

answer
3

Views

hadoop - resourceamanager don't execute

I have 4 cluster. It worked well before I turned off server.. (When I turned off this server, I ran "stop-yarn.sh" and "stop-dfs.sh". 2nd pc is resourcemanager. I executed start-yarn.sh and checked the result of jps. Other nodes run nodemanager process but, There was no resourcemanager process on 2n...
zeee1
0

votes
1

answer
6

Views

How to set user login credentials to Spark webUI in apache spark open source cluser

We are using open source apache spark cluster in our project. Need some help on following ones. How to enable login credentials for spark web ui login? How to disable “kill button” option from spark webui?. Can someone help me solutions to question 1 or 2 or both?. Thanks in advance.
Chandra
0

votes
0

answer
3

Views

increase spark cluster size automatically

I have a spark cluster with yarn and hadoop in Azure using Hdinsight. My cluster consist of 2 heads and 3 worker. My question is : "is it possible to increase the cluster size automatically ? " thank you
magic_banana
1

votes
1

answer
2.9k

Views

Yarn, node manager and resource manager

In YARN, which is of the following daemons takes care of the container and the resource utilization by the applications? Node Manager Job Tracker Task tracker Application Master Resource manager I am confused for this , containers are taken care by node manager and resource utilization by applicatio...
JJJ
1

votes
2

answer
4.4k

Views

How to set up Spark cluster on Windows machines?

I am trying to set up a Spark cluster on Windows machines. The way to go here is using the Standalone mode, right? What are the concrete disadvantages of not using Mesos or YARN? And how much pain would it be to use either one of those? Does anyone have some experience here?
user2306380
1

votes
3

answer
5.8k

Views

Ant BuildException error building Hadoop 2.2.0

I've been having trouble to build Hadoop 2.2.0 using Maven 3.1.1, this is part of the output I get (full log at http://pastebin.com/FE6vu46M): [INFO] ------------------------------------------------------------------------ [INFO] Reactor Summary: [INFO] [INFO] Apache Hadoop Main ......................
Río
1

votes
1

answer
4.4k

Views

Apache Hive on Yarn

As per my understanding from blogs, Yarn (mapred2) is faster or smarter than hadoop's mapreduce. If its true, is there way to configure Hive to use Yarn/Mapred2 without any complications to improve performance or to increase utilization of resources?
Murali Mopuru
1

votes
1

answer
569

Views

How to use JobClient in hadoop2(yarn)

(Solved)I want to contact hadoop cluster and get some job/task information. In hadoop1, I was able to use JobClient ( local pesudo distributed mode, use Eclipse): JobClient jobClient = new JobClient(new InetSocketAddress("127.0.0.1",9001),new JobConf(config)); JobID job_id = JobID.forName("job_xxxxx...
user2457766
1

votes
2

answer
1k

Views

Launch mapreduce job on hadoop 2.2 (Yarn) from java application

I'm trying to call a mapreduce job from a java application. In former hadoop versions (1.x) I created a Configuration object and a Job object, set mapred.job.tracker and fs.default.name in the Configuration and ran the Job. Now, in hadoop 2.x the job tracker does not exist anymore neither exists th...
user3570620
1

votes
2

answer
672

Views

How to reset Iterator on MapReduce Function in Apache Spark

I'm a newbie with Apache-Spark. I wanna know how to reset the pointer to Iterator in MapReduce function in Apache Spark so that I wrote Iterator iter = arg0; but it isn't working. Following is a class implementing MapReduce function in java. class CountCandidates implements Serializable, PairFl...
Likoed
1

votes
3

answer
3.8k

Views

How to eliminate Error util.Shell: Failed to locate the winutils binary

I am executing a remote job from a windows machine(the client) under eclipse, I clarify that I dont have any hadoop installation on my windows client, and I dont needed, I am executing the hadoop job remotely, and hadoop is installed on a linux machine. Everything is executed correctly, but I would...
Kaiser
1

votes
3

answer
3k

Views

java.io.IOException: Cannot initialize Cluster in Hadoop2 with YARN

This is my first time posting to stackoverflow, so I apologize if I did something wrong. I recently set up a new hadoop cluster, and this is my first time trying to use Hadoop 2 and YARN. I currently get the following error when I submit my job. java.io.IOException: Cannot initialize Cluster. Please...
Jason Arnold
1

votes
3

answer
1.9k

Views

Disable Application report for a Spark Job

When I submit a Spark job (on AWS-EMR), I have a lot of "INFO log" on the console: 15/02/17 19:44:46 INFO yarn.Client: Application report for application_1455192031517_0006 (state: ACCEPTED) 15/02/17 19:44:47 INFO yarn.Client: Application report for application_1455192031517_0006 (state: RUNNING) :...
Edamame
1

votes
1

answer
4k

Views

Not able to see Job History(http://localhost:19888) page in web browser in Hadoop

I am using Hadoop version 2.4.1 on Ubuntu 14.04 32 bit. When I run a sample job using hadoop jar user_jar.jar command, I am not able to see output on http://localhost:19888 (Page not found) What could be the possible reason ? Thank you in advance. JPS output : 3931 Jps 3719 NodeManager 3420 Seconda...
user2302742
1

votes
1

answer
2.3k

Views

Could not deallocate container for task attemptId NNN

I'm trying to understand how the container allocates memory in YARN and their performance based on different hardware configuration. So, the machine has 30 GB RAM and I picked 24 GB for YARN and leave 6 GB for the system. yarn.nodemanager.resource.memory-mb=24576 Then I followed http://docs.hortonwo...
dreamer
1

votes
3

answer
1.2k

Views

how to ignore key-value pair in Map-Reduce if values are blank?

I have a tab separated input file from where I am reading 2 columns in Map-Reduce. 1 column is the key and the other value. So my requirement is, If value is blank i.e.. it contains space or tab or any other character, even the key should not be processed to the reducer.In whole, it should discard t...
Shash
1

votes
1

answer
3.4k

Views

yarn is using 100% resources when running a hive job

I'm running a hive tez job. the job is to load the data from one table which is of text file format to another table with orc format. I'm using INSERT INTO TABLE ORDERREQUEST_ORC PARTITION(DATE) SELECT COLUMN1, COLUMN2, COLUMN3, DATE FROM ORDERREQUEST_TXT; When I'm monitoring the job through...
Rahul Reddy
1

votes
1

answer
1.7k

Views

Spark assembly file uploaded despite spark.yarn.conf being set

I submit jobs to a Spark cluster running on Yarn using spark-submit sometimes through a relatively slow connection. In order to avoid uploading the 156MB spark-assembly file for each job, I set the configuration option spark.yarn.jar to the file on HDFS. However, this does not avoid the upload, but...
Carsten
1

votes
3

answer
1.7k

Views

Why does launching spark-shell with yarn-client fail with “java.lang.ClassNotFoundException: org.apache.hadoop.fs.FSDataInputStream”?

I am trying to set up a cluster at home for my personal needs (learning). First I made Hadoop+Yarn. MR2 is working. Second - I am trying to add Spark but getting an error about missing classes. [[email protected] conf]# spark-shell --master yarn-client Exception in thread "main" java.lang.NoClassDefFoundE...
IgorZ
1

votes
2

answer
8k

Views

how to increase java heap size in Hadoop

I am using Hadoop 2.6.0 version and trying to run Hive insert into table where i got the JAVA Heap error. Is there any way I can increase the heap size in hadoop through out the cluster? Thanks in advance
shaik mahammed
1

votes
1

answer
707

Views

NoClassDefFoundError org/apache/hadoop/yarn/server/timelineservice/collector/TimelineCollectorManager

Getting this error while I executed start-all.cmd command. Also I am unable to access http://localhost:8088 but I am able to acceess http://localhost:9870 The error code below is from the Resource Manager command prompt FATAL resourcemanager.ResourceManager: Error starting ResourceManager java.lang...
Hitesh Somani
0

votes
0

answer
3

Views

Spark - how does spark executes spark job and creates stages and task.(for below scenarion)

consider following scenario: Input data size(raeding from hdfs): 20 GB No of executors: 2 executor memory : 8 GB RDD partition factor: 2 and we run a spark job in client mode. So in this case: 1. how toatal 20GB data will get processed through sparkjob? 2. How many stages and task will get created?...
sunilgaikwad
1

votes
2

answer
2.5k

Views

Setting YARN queue in PySpark

When creating a Spark context in PySpark, I typically use the following code: conf = (SparkConf().setMaster("yarn-client").setAppName(appname) .set("spark.executor.memory", "10g") .set("spark.executor.instances", "7") .set("spark.driver.memory", "5g") .set("spark.shuffle.service.enabled","true") .se...
Tim
2

votes
0

answer
69

Views

Push Data to Secure ElasticSearch from PySpark - Certificate Issue

I have a ElasticSearch Cluster with SearchGuard Enabled. I am trying to push data into ElasticSearch with Spark. OS - CentOS7 ElasticSearch Version - 6.4.1 Spark - 2.3.0 Java - openjdk-1.8.0 Yarn - 2.7.3 HDFS - 2.7.3 HDP - 2.6.5.0 ElasticSearch has been secured with SearchGuard via PEM key. The chai...
ArnavRay
0

votes
0

answer
5

Views

Config parameter to kill spark job if it cannot get yarn containers

If there a spark config parameter we can pass while submitting jobs through spark-submit, that will kill/fail job if it does not gets containers in given time ? For example if job requested for 8 yarn containers, which could not be allocated for 2 hours, then job kill itself.
user10439725
9

votes
1

answer
817

Views

Not able to invoke a spark application using a java class on cluster

Below is my project's structure: spark-application: scala1.scala // I am calling the java class from this class. java.java // this will submit another spark application to the yarn cluster. The spark-application that is being triggered by java class: scala2.scala My reference tutorial is here When I...
ankush reddy
9

votes
0

answer
263

Views

yarn workspace deploy into a docker image

I am using yarn workspaces and I have this packages in my package.json: "workspaces": ["packages/*"] I am trying to create a docker image to deploy and I have the following Dockerfile: # production dockerfile FROM node:9.2 # add code COPY ./packages/website/dist /cutting WORKDIR /cutting COPY packag...
dagda1
1

votes
2

answer
2.7k

Views

How to improve performance of loading data from NON Partition table into ORC partition table in HIVE

I'm new to Hive Querying, I'm looking for best practices to retrieve data from Hive table. we have enabled TeZ has execution engine and enabled vectorization. We want to make reporting from Hive table, I read from TEZ document that it can be used for real time reporting. Scenario is from my WEB App...
user145610
1

votes
1

answer
2.1k

Views

Spark ExecutorLostFailure Memory Exceeded

I have been trying to get a spark job to run to completion for several days now and I was finally able to get it to complete but there was still a large number of failed tasks where executors where being killed with the following message: ExecutorLostFailure (executor 77 exited caused by one of th...
Nathan Case
1

votes
2

answer
5.7k

Views

Spark Hive reporting pyspark.sql.utils.AnalysisException: u'Table not found: XXX' when run on yarn cluster

I'm attempting to run a pyspark script on BigInsights on Cloud 4.2 Enterprise that accesses a Hive table. First I create the hive table: [[email protected] ~]$ hive hive> CREATE TABLE pokes (foo INT, bar STRING); OK Time taken: 2.147 seconds hive> LOAD DATA LOCAL INPATH '/usr/iop/4.2....
Chris Snow
1

votes
2

answer
2.5k

Views

Permission Denied error while running start-dfs.sh

I am getting this error while performing start-dfs.sh Starting namenodes on [localhost] [email protected]: localhost: rcmd: socket: Permission denied Starting datanodes [email protected]: localhost: rcmd: socket: Permission denied Starting secondary namenodes [Gaurav] [email protected]: Gaurav: rcmd: socket: Permis...
Gaurav A Dubey
2

votes
2

answer
398

Views

Spark streaming with Yarn: executors not fully utilized

I am running spark streaming with Yarn with - spark-submit --master yarn --deploy-mode cluster --num-executors 2 --executor-memory 8g --driver-memory 2g --executor-cores 8 .. I am consuming Kafka through DireactStream approach (No receiver). I have 2 topics (each with 3 partitions). I reparation RD...
Nishant Kumar
2

votes
1

answer
1.9k

Views

Apache SPARK:-Nullpointer Exception on broadcast variables (YARN Cluster mode)

I have a simple spark application, where I am trying to broadcast a String type variable on YARN Cluster. But every time I am trying to access the broadcast-ed variable value , I am getting null within the Task. It will be really helpful, if you guys can suggest, what I am doing wrong here. My cod...
2

votes
1

answer
456

Views

YARN minimum-user-limit-percent not working?

I'm using the capacity scheduler in YARN and I saw that there's the possibility for users to get a minimum percentage of the queue by using the property 'yarn minimum-user-limit-percent'. I set this property to 20, and what I would expect is that resources would get equally distributed up to 5 users...
Cristina Luengo

View additional questions