Questions tagged [hive]

0

votes
0

answer
3

Views

Hive Query with multiple Columns in Select and group by one column

I have the below sample image of the dataset and the expected result. What can be the best way to achieve this kind of result in a dataset with a billion records. Should we use the intermediate temporary tables or in 1 Query. Req:- Get all the records for the SNs which has more than 2 records in th...
Sam
1

votes
1

answer
33

Views

Data load from HDFS to ES taking very long time

I have created an external table in hive and need to move the data to ES (of 2 nodes, each with 1 TB). Below regular query taking very long time (more than 6 hours) for a source table with 9GB of data. INSERT INTO TABLE . SELECT COL1, COL2, COL3..., COL10 FROM .; ES index is having default 5 shard...
RAVITEJA SATYAVADA
5

votes
1

answer
46

Views

Hive CSV line delimiter configuration

When creating an external table on a CSV file using Hive, you can either use the Hive-internal CSV Serde: ... ROW FORMAT DELIMITED FIELDS TERMINATED BY ',' STORED AS TEXTFILE LOCATION '...' TBLPROPERTIES('serialization.null.format'='') or the OpenCSV Serde: ROW FORMAT SERDE 'org.apache.hadoop.hive.s...
Markus Appel
1

votes
2

answer
2.2k

Views

Sqoop incremental export using hcatalog?

Is there a way to use sqoop to do incremental exports ? I am using Hcatalog integration for sqoop.I tried using the --last-value, --check-column options which are used for incremental import, but sqoop gave me error that the options were invalid.
VoodooChild
0

votes
0

answer
4

Views

com.tableausoftware.jdbc.TableauJDBCException: Error reading metadata for prepared query

When I use tableau 'order jdbc' connect to hdp3.1 hive I get this error, but extract is working
IvanLeung
1

votes
0

answer
8

Views

Hive question - Rank() OVER (PARTITION BY dept ORDER BY sum(salary))

I am trying to understand how to use the rank() over(partition by ) in Apache Hive, but have problems getting the results I desire. All the way at the bottom of the post is the dataset that I am working with. What I am trying to do is to come up with a statement that will uniquely rank the departmen...
Dora Chua
0

votes
0

answer
5

Views

spark read from hiveserver2 (JDBC ) Remote Cluster

I have requirement of Reading from Hive Source Table from different Cluster - I am trying to research how this can be achieved? I am planning to use HiveServer2 ( JDBC ) connection as an option. Can someone please refer me some sample code or some Reference URL. I tried using 'hive-jdbc.jar', and be...
Abhijeet Rajput
0

votes
0

answer
7

Views

Converting MySQL query to Hive

I am trying to convert following MySQL query to Hive MySQL Query SELECT departments.dept_name,dept_emp.dept_no,gender,(count(*)/(select count(*) from employees)) AS Sex FROM employees,dept_emp,departments WHERE dept_emp.dept_no = departments.dept_no AND dept_emp.emp_no = employees.emp_no GROUP BY...
2

votes
0

answer
32

Views

Import MongoDB data into Hive Error: Splitter implementation is incompatible

I'm trying to import mongodb data into hive. The jar versions that i have used are ADD JAR /root/HDL/mongo-java-driver-3.4.2.jar; ADD JAR /root/HDL/mongo-hadoop-hive-2.0.2.jar; ADD JAR /root/HDL/mongo-hadoop-core-2.0.2.jar; And my cluster versions are Ambari - Version 2.6.0.0, HDFS 2.7.3, Hive 1....
Bunny
1

votes
1

answer
9

Views

Query taking time despite adding session settings

Following is the ETL generated query Query - SELECT infaHiveSysTimestamp('SS') as a0, 7991 as a1, single_use_subq30725.a1 as a2, SUBSTR(SUBSTR(single_use_subq30725.a2, 0, 5), 0, 5) as a3, CAST(1 AS SMALLINT) as a4, single_use_subq30725.a3 as a5, single_use_subq30725.a4 as a6, SUBSTR(SUBSTR(SUBSTR(s...
Kumar
1

votes
3

answer
367

Views

unable to create hive table

I am unable to create hive table. Following is the code CREATE TABLE NYSE(exchange STRING, stock_symbol STRING, stock_date DATE, stock_price_open FLOAT, stock_price_high FLOAT, stock_price_low FLOAT, stock_price_close FLOAT, stock_volume INT, stock_price_avg_close FLOAT) ROW FORMAT DELIMITED F...
Sushil
1

votes
2

answer
1.5k

Views

Join multiple tables in Hive

Below is the data set Table1 col1,col2 key1,k1 key2,k2 key3,k3 Table2 col1,col3 key1,k11 key2,k22 key4,k44 Table3 col1,col4 key1,k111 key2,k222 key5,k555 I need to join the 3 tables based on col1. Below is my query select a.col1,a.col2,b.col3,c.col4 from table1 a full outer join table2 b full outer...
1

votes
2

answer
481

Views

Migrate Hive tables to redshift

Let me explain a bit the scenario: I have hundreds of hive tables stored on S3 (ORC, Parquet), so just to be clear no HDFS. Now, I am interested in migrating some of them to Redshift to run some performance tests. I know that redshift does not support ORC, Parquet so I need to create some CSV/JSON t...
Edge7
1

votes
2

answer
900

Views

Subtracting days from current_timestamp() in Hive

I want to get the timestamp that is exactly 10 days before the current timestamp in Hive. I can get the current timestamp using the function current_timestamp() in hive (I don't want to use unix_timestamp() here because its deprecated in recent versions of hive). So, How do I get the timestamp which...
Dinesh Raj
1

votes
1

answer
1.3k

Views

JDBC to hive connection fails on invalid operation isValid()

I have followed this doc to try to set up a jdbc connection to hive. But eclipse shows this error. Not seem to figure out what it exactly means and the connection with appropriate password and username works in beeline so its not the problem of authentication.Below is the error i'm facing: > 15/11/2...
Codex
1

votes
4

answer
9.5k

Views

What's hive-site.xml including in $SPARK_HOME looks like?

I am a beginner at hive, something happened (can not find table) when I start spark job and read data from hive. I don't set hive-site.xml in $SPARK_HOME/conf ? submit the spark job command is here bin/spark-submit --master local[*] --driver-memory 8g --executor-memory 8g --class com.ctrip.ml.clie...
hash-X
1

votes
1

answer
1.2k

Views

spark-1.5.1 throwing out of memory error for hive 1.2.0 using HiveContext in java code

I have a spark-1.5.1 for HADOOP 2.6 running in stand alone mode on my local machine. I am trying to run a hive query from a sample java application, pointing spark.master to (spark://impetus-i0248u:7077) spark master running on my local machine. Here is the piece of java code: SparkConf sparkconf =...
Reena Upadhyay
1

votes
1

answer
1.9k

Views

Hive Query fails for HDFS user

If I run the hive shell as myself I can query tables. but if I run hive shell using sudo -u hdfs hive then all my queries fail with the error message Application application_1447966350718_10654 failed 2 times due to AM Container for appattempt_1447966350718_10654_000002 exited with exitCode: -1000 F...
Knows Not Much
1

votes
2

answer
2.8k

Views

issue with Hive Serde dealing nested structs

I am trying to load a huge volume json data with nested structure to hive using a Json serde. some of the field names start with $ in nested structure. I am mapping hive filed names Using SerDeproperties, but how ever when i query the table, getting null in the field starting with $, tried with diff...
Simi
1

votes
1

answer
2.7k

Views

hive creating table duplicate column name error

I am trying to analyze the Twitter data. When I tried to create a table by using the following command: hive> CREATE external TABLE tweets ( retweeted boolean, createpapa string, place string, text string, retweeted_status STRUCT, created_at string, place string, text string, entitles STRUCT, sou...
Nikitha JV
1

votes
1

answer
517

Views

I have a json file and I want to create Hive external table over it but with more descriptive field names

I have a JSON file and I want to create Hive external table over it but with more descriptive field names.Basically, I want to map the less descriptive field names present in json file to more descriptive fields in Hive external table. e.g. {'field1':'data1','field2':100} Hive Table: Create External...
Palak Sukant
1

votes
1

answer
2.5k

Views

cannot load hivecontext in spark zeppelin

I have installed zeppelin. Everything is working except when i try to import an hive context. MY configuration on Zeppelin : System.getenv().get('MASTER') System.getenv().get('SPARK_YARN_JAR') System.getenv().get('HADOOP_CONF_DIR') System.getenv().get('JAVA_HOME') System.getenv().get('SPARK_HOME')...
patpat
1

votes
1

answer
14

Views

Replicate constant output based on the occurrance of specific events

I have a table with events (say X, Y, Z are random events and A, B are the ones I want to track). If I find event A, I want to output 1 on the current and following rows and if I find B I output -1 on the current and following rows, before I find any of them (A or B) I output 0. How do I do that usi...
Thiago Balbo
6

votes
5

answer
474

Views

Hive update with subquery

I'm trying to update a Hive table from subquery and I know hive doesn't support such updates. Is there any work-around for this? My update looks like this UPDATE tmp_aka SET guid = (SELECT mguid FROM tmp_maxs WHERE tmp_maxs.guid = tmp_aka.guid);
hlagvankar
0

votes
0

answer
4

Views

Hive - Update records in a table with todays date IF they are not found in another table?

I currently have a Main result table (test1) that stores all of my records of issues - I have a second table (test2) that Is run every week or so and I am trying to find those records where not exists on the weekly update and then update the date in the main result table as that is when it got updat...
Sam
1

votes
2

answer
585

Views

Parsing date format to join in hive

I have a date field which is of type String and in the format: 03/11/2001 And I want to join it with another column, which is in a different String format: 1855-05-25 12:00:00.0 How can I join both columns efficiently in hive, ignoring the time part of the second column? My query looks like below:...
Aman
1

votes
1

answer
1.7k

Views

Cannot query parquet file created by Spark

Created a parquet file in Spark. Here is the code snippet parquet_file_name = os.path.join(partition, os.path.basename(fileLocation) + '.parquet') dfData = sqlContext.createDataFrame(addedColumns, schema) dfData.save(parquet_file_name, 'parquet', 'append') I can read the file contents in Spark. In [...
mdem
1

votes
2

answer
826

Views

ESRI Hive ST_Contains does not work properly

Trying this with the JARs I could find (not sure they are the best choice for this, I needed to use ESRI and do it in Hive): ADD JAR /home/user/lib/esri-geometry-api-1.2.1.jar; ADD JAR /home/user/lib/spatial-sdk-hive-1.1.1-SNAPSHOT.jar; ADD JAR /home/user/lib/esri-geometry-api.jar; ADD JAR /home/use...
mel
1

votes
2

answer
2k

Views

How to connect Kerberized Hive via ODBC and avoid the “No credentials cache found” error

I am trying to connect to a HiveServer2 (Hive 0.14 from HDP 2.2) on a kerberized cluster from a windows machine using ODBC. I have followed the guide at http://hortonworks.com/wp-content/uploads/2014/05/Product-Guide-HDP-2.1-v1.01.pdf When I try to test my ODBC connection (using the 'Test' button in...
Thomas Larsson Kron
1

votes
2

answer
7.3k

Views

sql select priority based on multiple columns

Here is some sample data. I'm trying to get a single record for each UserID for the most recent activity date. If the a user watched more than one movie on a given date, record should be selected based on priority associated with movie name UserID MovieName ActivityDate 1 MOV1 2015-02-12 2...
tetrathinker
1

votes
2

answer
867

Views

Automatic login using beeline

I am using beeline as a client to access hive databases. Every time I use beeline, it asks me for the connection URL, username and password. Is there a way to set these parameters in a configuration file and load it automatically instead of re-typing them for each login ?
Yehia Elshater
1

votes
1

answer
76

Views

Mutiple Where subqueries in Hive doesn't work

I have a Query like below: SELECT T.MTH_END_DT, T.SRC_SYS_CD, T.BTCH_ID FROM PROD_RCRR.BAL_CNTRL_LOG T WHERE T.SRC_SYS_CD='SL' AND T.MTH_END_DT in (SELECT(MAX(MTH_END_DT)) FROM PROD_RCRR.BAL_CNTRL_LOG) AND T.BTCH_ID in (SELECT(MAX(BTCH_ID )) FROM PROD_RCRR.BAL_CNTRL_LOG) A error message shows Hive o...
Rachael Li
1

votes
1

answer
3k

Views

Float vs Double data type in Hive

As per the Hive's documentation: FLOAT (4-byte single precision floating point number) DOUBLE (8-byte double precision floating point number) What does 4-byte or 8-byte single precision floating point number mean?
dev ツ
1

votes
1

answer
9.4k

Views

Inserting into Hive table - Non Partitioned table to Partitioned table - Cannot insert into target table because column number/types

When I tried to insert into a Partiotioned table I am getting the bellow error SemanticException [Error 10044]: Line 1:23 Cannot insert into target table because column number/types are different ''US'': Table insclause-0 has 2 columns, but query has 3 columns. My Input data 1,aaa,US 2,bbb,US 3,cc...
Sachin Sukumaran
1

votes
1

answer
53

Views

How to return non-empty rows for a given ID - Hive

I have a table X ID A B -------------- 1 abc 27 1 - 28 2 - 33 3 xyz 41 3 - 07 I need output as ID A B -------------- 1 abc 27 2 - 33 3 xyz 41 I tried doing max(A) OVER (PARTITION BY ID) as the_value but it did not work. I can s...
underwood
1

votes
1

answer
62

Views

create table temp2 select * from temp1 is not taking all the properties of the source table in hive 0.14

I am trying to create table (temp2) from another table(temp1) using below procedure and table is getting created but few properties are missing in the temp2 table like below example. Properties missing in table temp2 are field.delim '\t' --- This is missing serialization.format '\t' ---...
goks
1

votes
2

answer
2.4k

Views

Unauthorized connection for super-user: hcat" when trying to query Hive through WebHCat

I'm trying to execute a Hive query using WebHCat / Templeton. I POST my query to /templeton/v1/hive with the 'execute' parameter set equal to my query (a simple 'select count(*)' query for now). But when I do this, I always get back this error: {'error':'Unauthorized connection for super-user: hcat...
mindcrime
1

votes
1

answer
47

Views

Chaining joins in SQL based on dynamic table

The title may not be accurate for the question but here goes! I have the following table: id1 id2 status 1 2 a 2 3 b 3 4 c 6 7 d 7 8 e 8 9 f 9 10 g I would like to get the first id1 and last status based on a dynamic chain j...
invoker
1

votes
1

answer
631

Views

Unable to use TotalOrderPartitioner with Hive: Can't read partitions file

We are trying to use generate HBase Hfiles for bulk loading from Hive. Our main problem is that when using the org.apache.hadoop.mapred.lib.TotalOrderPartitioner; it cannot find the custom partitioner file: java.lang.IllegalArgumentException: Can't read partitions file Further details: A custom par...
kentt
1

votes
4

answer
75

Views

How to find maximum value and its reference name from hive table?

I have a hive table 'airline' like this: name airline USA American Airline Nepal Jet Airline Dubai Emirates USA SouthWestern USA Quatar USA Delta Now, I wanted to know which country have highest number of airlines. I am using nested subqueries. select max(tot) from (sel...
dev

View additional questions