Questions tagged [hive]

1

votes
1

answer
465

Views

Unable to query parquet data with nested fields in presto db

I have data, some of each includes nests columns (arrays of arrays of objects), saved as PARQUET in Spark 2.2. Now I'm trying to access this data externally with presto and I get following exception when I'm trying to access any nested column. com.facebook.presto.spi.PrestoException: Error opening...
mixermt
1

votes
1

answer
57

Views

why boolean field is not working in Hive?

I have a column in my hive table which datatype is boolean. when I tried to import data from csv, it stored as NULL. This is my sample table : CREATE tABLE if not exists Engineanalysis( EngineModel String, EnginePartNo String , Location String, Position String, InspectionReq boolean) ROW FORMAT DELI...
1

votes
2

answer
85

Views

How to identify repeated occurrences of a string column in Hive?

I have a view like this in Hive: id sequencenumber appname 242539622 1 A 242539622 2 A 242539622 3 A 242539622 4 B 242539622 5 B 242539622 6 C 242539622...
Isaac
1

votes
1

answer
35

Views

Add multiple sheets in existing excel using pandas in Python

My code is reading SQL queries from text file and executing them one by one in python.I am trying save result of queries in the same excel but in different tabs/worksheets import pyodbc as hive import pandas as pd filename =r'C:\Users\krkg039\Desktop\query.txt' fd=open(filename,'r') sqlFile=fd.read(...
Anurag Chand
0

votes
0

answer
3

Views

Hive to escape Null or blank strings with contact_ws

is there a way to escape null seperator whileusing contact_ws. I have a data that is populating lik ,20000 and I want to remove coma for the single population. Eg: ID value 1 AAA 1 BBBB 2 2 CCCC 3 AAA 4 CCCD 4 DEDED 4 Current Result: After using contact_ws with , as seperator and c...
Seshi Kumar
1

votes
1

answer
5k

Views

How to compute the intersections and unions of two arrays in Hive?

For example, the intersection select intersect(array('A','B'), array('B','C')) should return ['B'] and the union select union(array('A','B'), array('B','C')) should return ['A','B','C'] What's the best way to make this in Hive? I have checked the hive documentation, but cannot find any relevant info...
Osiris
1

votes
2

answer
5.9k

Views

Hive tables not found in Spark SQL - spark.sql.AnalysisException in Cloudera VM

I am trying to access Hive tables through a java program, but looks like my program is not seeing any table in the default database. I however can see the same tables and query them through spark-shell. I have copied hive-site.xml in spark conf directory. Only difference - the spark-shell is running...
Joydeep
1

votes
3

answer
45

Views

Hive: What Happens if I Manually Copy Data Files into Location Folder of a Table?

I have tried copying data files into the location folder of a table (rather than using the load command), and it works in the sense that I can query the new data. However, all sources that I see will always use load command to do this; they never talk about copying data files directly to the locati...
user1888243
1

votes
2

answer
43

Views

How to efficiently perform union of two queries with and without group by

I have a query that performs a union between two select statements one that uses group by and another that doesn't. The problem is I'm selecting the same columns and using the same fucntions in both select statements. It feels Im duplicating the code and I wish to know if there's a better way to wri...
Aravind Balaji
1

votes
1

answer
91

Views

Loading Avro Data into BigQuery via command-line?

I have created an avro-hive table and loaded data into avro-table from another table using hive insert-overwrite command.I can see the data in avro-hive table but when i try to load this into bigQuery table, It gives an error. Table schema:- CREATE TABLE `adityadb1.gold_hcth_prfl_datatype_accepten...
Vishwanath Sharma
1

votes
0

answer
162

Views

LDAP User/Group filter HIVE

I have a Ldap server and a group. Now I want to do the Ldap authentication for AWS HIVE using that group. Please find the details below: **CN=hadoop-admins OU=Groups,OU=Root DC=int,DC=domain,DC=com** I have put the values in the following hive properties: hive.server2.authentication.ldap.groupDNPatt...
Aditya Tiwari
1

votes
1

answer
1.2k

Views

How to merge small files in spark while writing into hive orc table

I am reading csv files from s3 and writing into a hive table as orc. While writing, it is writing lot of small files. I need to merge all these files. I have following properties set: spark.sql('SET hive.merge.sparkfiles = true') spark.sql('SET hive.merge.mapredfiles = true') spark.sql('SET hive.mer...
doitright
1

votes
1

answer
687

Views

How to combine multiple ORC files (belonging to each partition) in a Partitioned Hive ORC table into a single big ORC file

I have a partitioned ORC table in Hive. After loading the table with all possible partitions I get on HDFS - multiple ORC files i.e. each partition directory on HDFS has an ORC file in it. I need to combine all these ORC files under each partition to a single big ORC file for some use-case. Can some...
Anchit Jatana
1

votes
1

answer
14

Views

Use regex_extract to retrieve the score number in a string text column

I need to extract the float number after score. {'reason_desc': { 'score':'0.1', 'numOfIndicatrix':'0', 'indicatrix':[]}, 'success':true, 'id':'1555039965661065S427A2DCF5787920' } I expect the output of 0.1 or any number enclosed by ''.
JYWQ
1

votes
1

answer
823

Views

Pull data from RDS MySQL db using pyspark

I am using pyspark first time. I am trying to pull data from RDS MySQL database using below code. I have referred to the following links pyspark mysql jdbc load An error occurred while calling o23.load No suitable driver, https://www.supergloo.com/fieldnotes/spark-sql-mysql-python-example-jdbc/ and...
user15051990
1

votes
0

answer
483

Views

How to execute Hive queries on Hive 2.1.1 on Spark 2.2.0?

Simple queries, e.g. select, work fine, but when I use aggregate functions, e.g. count, I face errors. I use beeline to connect to Hive 2.1.1 with Spark 2.2.0 and Hadoop 2.8. hive-site.xml is as follows: hive.execution.engine spark Expects one of [mr, tez, spark]. Chooses execution engine. Options a...
chaithanyaa mallamla
1

votes
2

answer
1.1k

Views

Hive View Not Opening

In the Ambari UI of the hortonworks sandbox, I was trying to open Hive View through the account of maria_dev. But however, I was getting the following error: Service Hive check failed: Cannot open a hive connection with connect string jdbc:hive2://sandbox-hdp.hortonworks.com:2181/;serviceDiscovery...
Witty Counsel
1

votes
1

answer
358

Views

Query Hive SQL Using Jdbc Or Beeline, How to show progress of running map reduce jobs

I'm currently encountering a problem. To execute hive sql and show the progress or detailed info of running queries. I've read Hive Documents for help. But with hive-jdbc I can only wait until a query ends without any progress information, this is especially unacceptable when executing a large query...
JayZero
1

votes
1

answer
620

Views

hadoop BlockMissingException

I am getting below error: Diagnostics: org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-467931813-10.3.20.155-1514489559979:blk_1073741991_1167 file=/user/oozie/share/lib/lib_20171228193421/oozie/hadoop-auth-2.7.2-amzn-2.jar Failing this attempt. Failing the application. Alth...
Pooja Soni
1

votes
0

answer
851

Views

Airflow HiveOperator Result Set

I'm new to both Airflow and Python, and I'm trying to configure a scheduled report. The report needs to pull data from Hive and email the results. My code thus far: from datetime import datetime, timedelta from airflow import DAG from airflow.operators.hive_operator import HiveOperator default_args...
Myles Wehr
1

votes
0

answer
121

Views

Counting unique records in HIVE/SQL with many-to-many mapping based on a uniqueness criteria

The problem description is given as follows: A user may have multiple accounts and multiple phone numbers linked to that account. A single account might be linked to one or more phone numbers. A single phone number might be linked to one or more accounts. No two users will have the same account. N...
1

votes
0

answer
392

Views

AWS EMR Hive fails due to serde2/serde

I am running EMR hive query on S3 and it fails saying 'Map operator initialization failed' I tried to set HADOOP_CLASSPATH as below, still no luck. set HADOOP_CLASSPATH=/usr/lib/hive/lib/*; Also, adding below jar, add jar /usr/hive/json-serde-1.3.7-jar-with-dependencies.jar . This jar file is presen...
Jay Prakash Tiwari
1

votes
0

answer
166

Views

Insert JSON file into HBase using Hive

I have a simple JSON file that I would like to insert into an HBase table. My JSON file has the following format: { 'word1':{ 'doc_01':4, 'doc_02':7 }, 'word2':{ 'doc_06':1, 'doc_02':3, 'doc_12':8 } } The HBase table is called inverted_index, it has one column family matches. I would like to...
Achraf Oussidi
1

votes
0

answer
466

Views

How fix HIVE_CURSOR_ERROR on several columns in athena

I am trying to execute the following select statement in aws athena: SELECT col_1, col_2 FROM 'my_database'.'my_table' WHERE partition_1='20171130' AND partition_2='Y' LIMIT 10 And I got en error: Your query has the following error(s): HIVE_CURSOR_ERROR: Can not read value at 0 in block 0 in file s3...
Cherry
1

votes
0

answer
162

Views

Crystal Reports integration with Hadoop/Hive/HPLSQL

We are migrating data from Oracle to Hadoop and there is a requirement to continue use the existing reporting tool(Crystal Report) to generate reports from Hadoop (instead of Oracle) In the current scenario we are using an Oracle Stored PROC to do few aggregations /logic. Now with the above requir...
Nina A
1

votes
1

answer
65

Views

AND OR SQL operator with multiple records

I have the following query where if brand1/camp1 taken individually, query returns the correct value but if I specify more than one brand or campaigns, it returns some other number and I am not sure what the math is behind that. It is not the total of the two either. I think it is IN operator that...
RashItIs
1

votes
0

answer
149

Views

Hive describe shows partition also as column but describe formatted doesn't

Hive table created: create external table ini(id string, rand string) partitioned by (tmp string) Describe: describe ini; Output from hue: Describe formatted: describe formatted ini; Output from hue: Why is the partition column shown in column list in hive describe table? describe formatted seems to...
Ani Menon
1

votes
1

answer
483

Views

Lunch TDCH to Load to load data from Hive parquet table to Teradata

I need to load data from Hive tables which stored as parquet files to Teradata Database using TDCH(Teradata connector for Hadoop). I use TDCH 1.5.3 and CDH 5.8.3. and Hive 1.1.0 I try to start TDCH usign hadoop jar command and getting the Error: java.lang.ClassNotFoundException: org.apache.parquet.h...
Dobroff
1

votes
0

answer
117

Views

Unable to load data from multiple level directories into Hive table

I created a table the following way CREATE TABLE `default.tmptbl` (id int, name string) ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.OpenCSVSerde' WITH SERDEPROPERTIES ( 'escapeChar'='\\','quoteChar'='\'','separatorChar'=','); And I have data in HDFS that have been structured in the following way...
1

votes
1

answer
1.2k

Views

Can not connect to ZooKeeper/Hive from host to Sandbox Hortonworks HDP VM

I downloaded HDP-Sandbox (in an Oracle VirtualBox VM) a while ago, never used it much, and I’m now trying to access data from the outside world using Hive HDBC. I use hive-jdbc 1.2.2 from apache, which I got from mvnrepository, with all the dependencies in the classpath, or hortonworks JDBC got fr...
Sxilderik
1

votes
1

answer
297

Views

Hive sql struct mismatch

I have a table with columns like this: table field type(array) item cars(string) isRed(boolean) information(bigint) When I perform the following query select myfield1.isRed from mytable where myfield1.isRed = true I get an error: Argument type mismatch '': The 1st argument of EQUAL is expected to...
jumpman8947
1

votes
0

answer
392

Views

Redshift External tables via Hive metastore

I've a redshift DB setup and we do periodic archival of the data into S3. I would like to create redshift external tables on top of these archived files. AWS documentation suggests that this can be done either via athena or via hive metastore. Since athena is quite expensive, I would like to get thi...
Sneha
1

votes
1

answer
197

Views

insert data into table using csv file in HIVE

CREATE TABLE `rk_test22`( `index` int, `country` string, `description` string, `designation` string, `points` int, `price` int, `province` string, `region_1` string, `region_2` string, `taster_name` string, `taster_twitter_handle` string, `title` string, `variety` string, `winery` strin...
Harshit Mehta
1

votes
0

answer
233

Views

How to compare elements of an array with string in hive

I have created an table with complex data type array in hive. The query is create table testivr ( mobNo string, callTime string, refNo int, callCat string, menus array , endType string, duration int, transferNode string ) row format delimited fields terminated by ',' collection items terminated by...
Previnkumar
1

votes
1

answer
368

Views

Apache hive - How to limit partitions in show command

Is there any way to limit the number of Hive partitions while listing the partitions in show command? I have a Hive table which has around 500 partitions and I wanted the latest partition alone. The show command list all the partitions. I am using this partition to find out the location details. I d...
Aavik
1

votes
1

answer
368

Views

How can I use an SQL subquery within Spark 1.6

How can I convert the following query to be compatible with Spark 1.6 which does not supported subqueries: SELECT ne.device_id, sp.device_hostname FROM `table1` ne INNER JOIN `table2` sp ON sp.device_hostname = (SELECT device_hostname FROM `table2` WHERE device_hostname LIKE CONCAT(ne.device_id,...
user6666914
1

votes
2

answer
735

Views

COUNT() OVER possible using DISTINCT and WINDOWING IN HIVE

I want to calculate the number of distinct port numbers that exist between the current row and the X previous rows (sliding window), where x can be any integer number. For instance, If the input is: ID PORT 1 21 2 22 3 23 4 25 5 25 6 21 The outpu...
alejo
1

votes
1

answer
614

Views

How to read hive data from HDFS

I have hive warehouse in HDFS hdfs://localhost:8020/user/hive/warehouse. I have a database mydb inside hdfs like hdfs://localhost:8020/user/hive/warehouse/mydb.db How can I create a table & insert data into it using Pyspark Please suggest
Praveen Mandadi
1

votes
0

answer
357

Views

How can i find the latest partition in impala tables?

I need to collect the incremental stats frequently on a table, for that, i need to populate the latest partitions for the below variable: compute incremental stats someSchema.someTable partition (partitionColName=${value}); I have few options with me which I don't want to use for stability and perfo...
roh
1

votes
0

answer
77

Views

Why spark sql will throw another partition location not exist exception?

Below is my python code to get hive data with Spark SQL warehouse_location = 'file:///path/to/warehouse' spark = SparkSession \ .builder \ .config('spark.sql.warehouse.dir', warehouse_location) \ .enableHiveSupport() \ .getOrCreate() spark.sql('select * from tbl where dt='A') But Spark throw an File...
MoreFreeze

View additional questions