Questions tagged [cloudera]

1

votes
2

answer
5.9k

Views

Hive tables not found in Spark SQL - spark.sql.AnalysisException in Cloudera VM

I am trying to access Hive tables through a java program, but looks like my program is not seeing any table in the default database. I however can see the same tables and query them through spark-shell. I have copied hive-site.xml in spark conf directory. Only difference - the spark-shell is running...
Joydeep
1

votes
1

answer
215

Views

Not able to use HBaseTestingUtility with CDH 5.7

I am trying to use HBaseTestingUtility with CDH 5.7 as mentioned in the below blog and github http://blog.cloudera.com/blog/2013/09/how-to-test-hbase-applications-using-popular-tools/ https://github.com/sitaula/HBaseTest I have modified my pom.xml for CDH 5.7 like below 4.0.0 HBaseTest Test 0.0.1-SN...
tuk
1

votes
0

answer
55

Views

Pyspark error reading file. Flume HDFS sink imports file with user=flume and permissions 644

I'm using Cloudera Quickstart VM 5.12 I have a Flume agent moving CSV files from spooldir source into HDFS sink. The operation works ok but the imported files have: User=flume Group=cloudera Permissions=-rw-r--r-- The problem starts when I use Pyspark and get: PriviledgedActionException as:cloude...
Taka
1

votes
0

answer
149

Views

Hive describe shows partition also as column but describe formatted doesn't

Hive table created: create external table ini(id string, rand string) partitioned by (tmp string) Describe: describe ini; Output from hue: Describe formatted: describe formatted ini; Output from hue: Why is the partition column shown in column list in hive describe table? describe formatted seems to...
Ani Menon
1

votes
0

answer
179

Views

Spark streaming from Kafka returns result on local but not on YARN

I am using Cloudera's VM CDH 5.12, spark v1.6, kafka(installed by yum) v0.10 and python 2.66. I am following this link spark settings: Below is a simple spark application that I am running. It takes events from kafka and prints it after map reduce. from __future__ import print_function import sys fr...
Samhash
1

votes
0

answer
357

Views

How can i find the latest partition in impala tables?

I need to collect the incremental stats frequently on a table, for that, i need to populate the latest partitions for the below variable: compute incremental stats someSchema.someTable partition (partitionColName=${value}); I have few options with me which I don't want to use for stability and perfo...
roh
1

votes
3

answer
383

Views

How to overwrite data from text file into hive table replacing for specific date or for specific value

I am using cloudera Distribution with Hive version 'hive-common-1.1.0-cdh5.14.0' i.e. hive 1.1.0 version. Below is my hive table: hive> describe test; OK id int name string day...
Chaithu
1

votes
0

answer
63

Views

CDH spark steaming consumer kerberos kafka

Does any one tried to use spark-steaming(pyspark) as consumer for kerberos KAFKA in CDH ? I search the CDH and just find some example about Scala. Does it means CDH does not support this ? Anyone can help on this ???
znever
1

votes
1

answer
783

Views

Failed to connect to server: quickstart.cloudera/10.0.2.15:8032

[[email protected] ~]$ sqoop import -connect jdbc:mysql://localhost/test -username root -P -table transactions -m 1 When executing the above command, I get thefollowing exception. Warning: /usr/lib/sqoop/../accumulo does not exist! Accumulo imports will fail. Please set $ACCUMULO_HOME to the root...
Jagadeesh
1

votes
0

answer
38

Views

hive is not running in local mode in Cloudera Distribution?

When I am trying to run the start the hive I am getting following error, I am using cdh 5.12 in local mode Exception in thread 'main' java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang.RuntimeException: org.apache.hadoop.hive.ql.metadata.HiveException: java.lang...
Ravindra Mishra
1

votes
1

answer
88

Views

How to find CDH(Cloudera’s Distribution Including Apache Hadoop) version using HUE

I've access only to HUE and am trying to find out CDH version using HUE. Could you please help me with it?
Ash
1

votes
1

answer
188

Views

Expanding HDFS memory in Cloudera QuickStart on docker

I try to use the Cloudera QuickStart Docker Image, but it seems that there is no free space on hdfs (0 Bytes). After starting the Container docker run --hostname=$HOSTNAME -p 80:80 -p 7180:7180 -p 8032:8032 -p 8030:8030 -p 8888:8888 -p 8983:8983 -p 50070:50070 -p 50090:50090 -p 50075:50075 -p...
Alex
1

votes
1

answer
35

Views

Install Cloudera without virtual machine/box on windows, Is it possible with any package?

On windows 2008 R2 server edition, we can not run cloudera on vm. Therefore, finding out an option to install without vm dependency. Can we do that?
Nilesh Pandey
1

votes
0

answer
58

Views

What is recommended value for hdfs datanode cache?

How many memory should be set for hdfs datanode caching ? OS CentOS Linux 7.4 dfs.datanode.max.locked.memory This determines the maximum amount of memory a DataNode will use for caching
jBee
1

votes
1

answer
318

Views

can't execute any hadoop command after installing cloudera manager

I've set up my cdh cluster(5.14.0) successfully, which includes 4 nodes,and installed services including hdfs, yarn, zookeeper and impala. The cloudera manager web page works fine. But when I open a terminal and try to run any hadoop command, like hadoop, impala or anything, the shell prompt 'comma...
Simpson Yang
1

votes
0

answer
81

Views

Not able to put file on HDFS

I am having CDH virtual box running on my windows 10. I am running simple talend job, which has only component to put file on HDFS (tHDFSPut) from windows to HDFS which is located in virtual box. But when I run the job the file is created on HDFS but it is empty. I am getting following error, org.ap...
Vish
1

votes
0

answer
142

Views

Spark Job creates too much tasks

I am developing a code in Scala to launch into a Cloudera cluster. My code is: def func_segment (model: String) : String = { if(model == 'A1' || model == 'B1' || model == 'C1' || model == 'D1') 'NAME1' else if (model == 'A2' || model == 'B2') 'NAME2' else 'NAME3' } val func_segment_udf = udf((model...
1

votes
0

answer
39

Views

HBase - SAS Integration and Read

I have cloudera cluster(kerberos enabled) and HBase is running in it. I need few tables in HBase with the filter condition which needs to read/write from external SAS server. I am trying to achieve this through Thrift and Python whereas I have installed python in my SAS Server and accessing HBase t...
Srini Ravi
1

votes
0

answer
33

Views

JanusGraph Hbase repeatedly calls HBaseKeyColumnValueStore.getFilters()

We are running a Java application, which uses a JanusGraph backed by a Hbase table on a Cloudera cluster. We use the janusgraph-hbase dependency, v0.2. When running our app, we see these lines appear in the logs: 20180330 15:00:27;DEBUG;HBaseKeyColumnValueStore:145;Generated HBase Filter FilterList...
Charles
1

votes
1

answer
171

Views

Change the starting day of the week returned by impala trunc()

I am using impala to find the starting day of the week, like this: select TRUNC('2018-01-01', 'D') Which gives the start day based on a Monday - Sunday week. Is there any way to change this behavior to give me a Sun - Sat week? I need to change it for my query only, changing a server or cluster wi...
Tony
1

votes
0

answer
58

Views

Development environment for Cloudera

I'm a beginner in Big Data development and nowadays I'am using Cloudera services in a shared environment where I'm building spark scripts for data ingestion using Jupyter notebooks, but I'm not sure if is a good approach because I miss IDE resources such as code completion, auto import, debugger and...
Thales Rocha
1

votes
0

answer
2.1k

Views

Create Hive Partitioned Table

How to create a table T1 with partition P1 and table T2's columns? create table T2(F1 int, F2 varchar(101), ..., FN date); create table T1 as select * from T2 partitioned by (P1 int); Error thrown: AnalysisException: Syntax error in line 1:undefined: ...2 as (select * from T1) partitioned by (P1 int...
1

votes
0

answer
74

Views

Fault Tolerance in Apache Livy

Anyone having some insights regarding achieving fault tolerance in Apache Livy. Say for instance the Livy server fails how we can achieve HA.
Sumit Khurana
1

votes
0

answer
47

Views

How to get the scheduler of an already finished job in Yarn Hadoop?

So I'm in this situation, where I'm modifying the mapred-site.xml and specific configuration files of different schedulers for Hadoop, and I just want to make sure that the modifications I have made to the default scheduler(FIFO), has actually taken place. How can I check the scheduler applied to a...
mani
1

votes
3

answer
891

Views

Hive JDBC connection problems

I am trying to connect to Hive2 server via JDBC with kerberos authentication. After numerous attempts to make it work, I can't get it to work with the Cloudera driver. If someone can help me to solve the problem, I can greatly appreciate it. I have this method: private Connection establishConnectio...
Gary Greenberg
1

votes
0

answer
198

Views

Configuring CDH cluster with Python 3

We are using CDH 5.8.3 community version and we want to add support for Python 3.5+ to our cluster I know that Cloudera and Anaconda has such parcel to support Python, but this parcel support Python version 2.7. What is the recommended way to enable Python version 3+ on CDH cluster?
Rohan
1

votes
1

answer
253

Views

How to specify the timestamp format when creating a table using a hdfs directory

I have the following csv file located at the path/to/file in my hdfs store. 1842,10/1/2017 0:02 7424,10/1/2017 4:06 I'm trying to create a table using the below command: create external table t ( number string, reported_time timestamp ) ROW FORMAT delimited fields terminated BY ',' LOCATI...
akilat90
1

votes
0

answer
49

Views

Cloudera to HDP SOLR(version 5.5.2) Data Migration | Failed to Update solr indexes after restoration on solr cloud

SOLR version - 5.5.2 My Project requirement is to transfer solr cloud indexes from cloudera cluster to HDP cluster. Data is huge(1 billion indexed records on production), hence re-indexing is not an option. We have tried solr restore and backup APIs but data is not visible on cloud. Please check i...
Prachi Singh
1

votes
0

answer
493

Views

kerberos authentication in Kudu for spark2 job

I am trying to put some data in kudu, but the worker cannot find the kerberos token, so I am not able to put some data into the kudu database. here you can see my spark2-submit statement spark2-submit --master yarn 'spark.yarn.maxAppAttempts=1' --conf 'spark.authenticate=true' --deploy-mode cluster...
Lukas
1

votes
0

answer
18

Views

unable to connect to Impala(on AWS) from Talend (on prem)

I am trying to connect to Impala which is hosted on AWS from Talend Open Studio for Big Data which is installed on one of the jump host which is on-premise. When I configure the ImpalaConnection processor and try to run it's giving below error. [statistics] connecting to socket on port 3472 [statist...
user1718857
1

votes
1

answer
243

Views

Zookeeper running or not in relation to standard port 2181 usage?

CLOUDERA QUICKSTART 5.13 as follows. I am not sure whether zookeeper out of the box is running or not, and if so, then if it would work reliably? I got this when trying to run zookeeper from within the from kafka supplied version that I downloaded, in standalone mode: [2018-06-17 00:49:32,847] INFO...
thebluephantom
1

votes
1

answer
144

Views

Accessing kerberized WebHDFS on Cloudera from Knox

I have been trying to make Apache Knox work on a kerberised Cloudera Cluster. I downloaded the zip containing Knox and installed it. I also made changes specific to my cluster on the sandbox.xml file. However, when i run cURL command i get 404 not found error. Has someone, successfully managed to ru...
Shashank S
1

votes
1

answer
97

Views

Date/String Comparison in Impala doesn't work (always return false)

So i am currently writing an impala query which essentially group the data based on several column, and take the value of the rest of the column based on the most recent ones. However, as I want to group the data based on the date, the query always return false when comparing the data. My code is as...
1

votes
0

answer
228

Views

Impala [Catalog] and Hive [Metastore/Sentry] Not Sync

We use Cloudera (CDH 5.7.5) and Hue [3.9.0]. For admin user, some of hive tables (60%) is accessible through impala. The other hive tables is not accessible. For non admin user, no database which is accessible through Impala. And again, some of database is accessible via hive. Is it because Impala c...
Mahadi Siregar
1

votes
0

answer
39

Views

When trying to create a name node in Hadoop using command bin/hdfs namenode -format I am getting an error

18/08/06 06:35:39 INFO namenode.NameNode: STARTUP_MSG: /************************************************************ STARTUP_MSG: Starting NameNode STARTUP_MSG: host = ubuntu/127.0.1.1 STARTUP_MSG: args = [-format] STARTUP_MSG: version = 2.5.0-cdh5.3.2 STARTUP_MSG: classpath = /home/suhaibk...
SUHAIB KHAN
1

votes
1

answer
43

Views

Broken cluster caused by possible authentication issue

I have a three-nodes home Cloudera cluster for practicing which has been working fine until some day some changes were introduced without even noticing it, below is the error message: org.apache.thrift.transport.TTransportException: java.net.ConnectException: Connection refused The front-end error i...
Choix
1

votes
0

answer
226

Views

Hive and Impala showing different roles for user with Sentry installed

I am running Cloudera 5.15, with Kerboros enabled on the cluster. Sentry is installed to configure user access to various tables/databases ...etc. Everything is installed and working fine for Hive, but not for Impala. I'm using Hue web UI for issuing hive/impala queries. (I'm getting same results u...
rb21220689
1

votes
2

answer
171

Views

Create parameterized view in Impala

My goal is to create a parameterized view in Impala so users can easily change values in a query. If I run below query, for example, in HUE, is possible to introduce a value. SELECT * FROM customers WHERE customer_id = ${id} But I would like to create a view as follows, that when you run it, it ask...
1

votes
2

answer
89

Views

How to find KUDU master name or port in which KUDU DB in my cloudera cluster?

I am trying to write a Spark dataframe to Kudu DB, but I do not know the Kudu master. The cluster I am using is a Cloudera cluster. How do I find Kudu master in the cluster?
Karthik reddy
1

votes
0

answer
84

Views

Permission problem on using CDH6 common-hadoop dependency

I use the dependency in my maven project org.apache.hadoop hadoop-common 3.0.0-cdh6.0.0 , and when i excute mvn clean install, the problem occured: Failed to collect dependencies at org.apache.hadoop:hadoop-common:jar:3.0.0-cdh6.0.0 -> org.apache.hadoop:hadoop-auth:jar:3.0.0-cdh6.0.0 -> com.nimbusd...
BOT-CC

View additional questions