Questions tagged [impala]

1

votes
2

answer
32

Views

How to return preceding row value with column condition in SQL table?

I have the below SQL table in which I need the most recent price only when condition type is 00. Table: ProductID ConditionType Date Price 00001 01 2018-01-01 4.00 00001 01 2018-01-08 5.00 00001 00 2018-01-09 4.50 00001 01...
QuestionAsker
0

votes
0

answer
5

Views

Using Slick with Kudu/Impala

Kudu tables can be accessed via Impala thus its jdbc driver. Thanks to that it is accessable via standard java/scala jdbc api. I was wondering if it is possible to use slick for it. Or if not is any other high level scala db framework supporting impla/kudu.
abalcerek
1

votes
0

answer
357

Views

How can i find the latest partition in impala tables?

I need to collect the incremental stats frequently on a table, for that, i need to populate the latest partitions for the below variable: compute incremental stats someSchema.someTable partition (partitionColName=${value}); I have few options with me which I don't want to use for stability and perfo...
roh
1

votes
0

answer
167

Views

Hanging JDBC query with Impala

I am having a weird issue. I am building an automated testing A/B tool in order compare Impala to other sources. I have 100 queries I am trying to run, that run fine through other sources and within Impala through Hue. I try to run certain queries into Impala from Java and it hangs. I run same...
user434290
1

votes
0

answer
116

Views

Failed to process impala request after drop and create partition

OS: centos 7.4.1708 Python: 3.6 impyla==0.14.0 thriftpy==0.3.9 I have one use case: Create Table Add partition Insert values Drop partition (to remove old date) Create partition Insert values on the step 6 I got error thriftpy.transport.TTransportException: TTransportException(type=4, message='TSock...
1

votes
0

answer
221

Views

Impala SQL with Multiple Count Distinct - Help Needed

We have been trying hard for last several weeks for finding solution to a impala sql query problem. We are looking forward for any guidance or advise on this situation. Below is table for our requirement (also Image attached). we need to report counts against each endpoint_type. In my case its NAS a...
sudeep
1

votes
1

answer
171

Views

Change the starting day of the week returned by impala trunc()

I am using impala to find the starting day of the week, like this: select TRUNC('2018-01-01', 'D') Which gives the start day based on a Monday - Sunday week. Is there any way to change this behavior to give me a Sun - Sat week? I need to change it for my query only, changing a server or cluster wi...
Tony
0

votes
0

answer
8

Views

How to fix space in between the row values in hive table stored in parquet format?

I am trying to loading a Pipe delimited file into hive storing it in Parquet format. I am getting white-space character in all the rows. In the Pipe delimited input file, there are no spaces ID-12345 Name-ADAM. But its getting stored with whitespaces in between. This is happening for all rows. ID 1...
ganesh o
1

votes
1

answer
253

Views

How to specify the timestamp format when creating a table using a hdfs directory

I have the following csv file located at the path/to/file in my hdfs store. 1842,10/1/2017 0:02 7424,10/1/2017 4:06 I'm trying to create a table using the below command: create external table t ( number string, reported_time timestamp ) ROW FORMAT delimited fields terminated BY ',' LOCATI...
akilat90
1

votes
0

answer
238

Views

How can I refresh a Hive/Impala table from Spark Structured Streaming?

currently my Spark Structured Streaming goes like this (Sink part displayed only): //Output aggregation query to Parquet in append mode aggregationQuery.writeStream .format('parquet') .trigger(Trigger.ProcessingTime('15 seconds')) .partitionBy('date', 'hour') .option('path', 'hdfs://:8020/user/myuse...
messenjah00
1

votes
0

answer
22

Views

Simulate a sql reproducible example

I want to know a way to write a minimal reproducible example in impala. Here is my way. select 1 as x1, 2 as x2 union all select 3 as x1, 4 as x2 union all select 5 as x1, 6 as x2 union all select NULL as x1, 8 as x2 Is there any way to simulate an example more easily?
Jiaxiang
1

votes
1

answer
151

Views

asterisk or percentage sign in impala

The percentage sign (%) is used as the 'everything' wildcard instead of an asterisk. It will match zero or more characters. As @onedaywhen said, the two have same function. But in impala, I find they only work in different specific situation. show tables like ' ' Suppose in my database opd, there a...
Jiaxiang
1

votes
0

answer
223

Views

Select all except one impala

I am finding the apporoach to ignore a column from Inner-select Query in Impala . I am very well able to figure it out in Hive. Does anyone tried it in Impala ?? Hive : select `(col_name)?+.+` from t1 ; -- To Except a Column in Hive . Impala: I tried the same format in Impala .But Its throwing the...
Govind
1

votes
2

answer
47

Views

Left Join Not Yielding Results

I have the below query where i need to have all records from first table1 and corresponding value for table2.If value not there a NULL to be returned. But i am getting only common records returned in the result. select distinct s1.src_sys_id schema_nm, to_date(CAST(CAST(s3.execn_ts AS BIGINT)/1000...
sudeep
1

votes
0

answer
2k

Views

Refresh hive tables in Hive

I have few tables in Hive, every day new csv file will be adding to the hive table location. When a new data is available i need to refresh the tables so that i can see new data in the tables. steps we follow to load the data: first create a table with csv serde properties create another table wit...
Hari
1

votes
1

answer
97

Views

Date/String Comparison in Impala doesn't work (always return false)

So i am currently writing an impala query which essentially group the data based on several column, and take the value of the rest of the column based on the most recent ones. However, as I want to group the data based on the date, the query always return false when comparing the data. My code is as...
1

votes
0

answer
146

Views

How to handle hive and hbase integration which contains complex type: map and array

I would like to integrate hive and Hbase and query data from impala via hive metadata. hbase version: 1.2.0-cdh5.14.2 hive version: 1.1.0-cdh5.14.2 impala version: 2.11.0--cdh5.14.2 In the HBase table, there is only one column family and there are some columns in the column family which contains st...
Alvin
1

votes
0

answer
228

Views

Impala [Catalog] and Hive [Metastore/Sentry] Not Sync

We use Cloudera (CDH 5.7.5) and Hue [3.9.0]. For admin user, some of hive tables (60%) is accessible through impala. The other hive tables is not accessible. For non admin user, no database which is accessible through Impala. And again, some of database is accessible via hive. Is it because Impala c...
Mahadi Siregar
1

votes
1

answer
27

Views

Progress Data Direct

We are having cloudera impala as data source. It is required to build a new custom ODBC driver (we need to use our custom rest API,code logic in the new obdc driver) to read data from impala. That custom obdc driver needs to be recognised by tableau as custom driver for impala connectivity. As we...
katari. kusuma
1

votes
1

answer
60

Views

How to Concatenate two dataframes in impala

I'm looking for the synthax to concatenate two dataframes in IMPALA. I want to avoid that operation in R because i have to import the dataframes in R. Thank you for your help !
Antoine F
1

votes
0

answer
226

Views

Hive and Impala showing different roles for user with Sentry installed

I am running Cloudera 5.15, with Kerboros enabled on the cluster. Sentry is installed to configure user access to various tables/databases ...etc. Everything is installed and working fine for Hive, but not for Impala. I'm using Hue web UI for issuing hive/impala queries. (I'm getting same results u...
rb21220689
1

votes
2

answer
171

Views

Create parameterized view in Impala

My goal is to create a parameterized view in Impala so users can easily change values in a query. If I run below query, for example, in HUE, is possible to introduce a value. SELECT * FROM customers WHERE customer_id = ${id} But I would like to create a view as follows, that when you run it, it ask...
1

votes
2

answer
89

Views

How to find KUDU master name or port in which KUDU DB in my cloudera cluster?

I am trying to write a Spark dataframe to Kudu DB, but I do not know the Kudu master. The cluster I am using is a Cloudera cluster. How do I find Kudu master in the cluster?
Karthik reddy
1

votes
0

answer
22

Views

Tableau Impala empty dates

I've been creating tableau dashboards for a while now using the ZN(Lookup(SUM(field), 0)) trick for a while now to pad empty dates with zeros. However, when using the same trick with a live Impala connection to a Hadoop cluster, it seems that the output is no longer padded. Does anyone know why it n...
c3luong
1

votes
1

answer
86

Views

connecting PBI to impala

I created a cloudera cluster (ENTERPRISE DATA HUB) on azure. I can use the DNSname:7180 to view and manage cluster. However, I am not successfull in connecting to the Impala from PowerBI Desktop. I tried both VM names with dn0 and mn0 extension ([myhostname]-dn0.eastus2.cloudapp.azure.com) and port...
justin
1

votes
0

answer
37

Views

How to load Python Dataframe to cloudera Impala?

getting error while loading the dataframe data to Impala table DB = conn.cursor() for row in fourth_set: SQL = ('''Insert into Boots_retailer(sale_date, product, Assessment, weekno, store_Number, volume, turnover, turnover_missing, Inv_Cubic, XGB, KNN) values(?,?,?,?,?,?,?,?,?,?,?)''' ) Values = ro...
Harsha Varanasi
1

votes
0

answer
99

Views

Create External Table in Impala on a Parquet directory with multiple parquet files with different schemas

In Spark, we can read multiple parquet files with different schemas by setting mergedSchema option to true. Is there any similar functionality in Impala that allows us to point the External table to a directory that has multiple parquet files with different schema? Example: We have MEDICAL.parque...
Shuan
1

votes
1

answer
20

Views

What are the best way to find out in impala if table a is a subset of table b?

I have two parquet based external tables in Impala, like to know if one is the subset of another, what would be the best way to get that? The two tables has same schema with dozen or even hundred fields Thank you.
mdivk
1

votes
1

answer
37

Views

Find users who logged in during a specific time after registration

I was trying to figure out how to approach to the following: I have table registrations: +---------+---------------------+ | user_id | reg_date | +---------+---------------------+ | a | 2018-11-01 20:47:46 | | b | 2018-11-02 21:07:15 | | c | 2018-11-03 05:24:31 | +------...
Nikita
1

votes
0

answer
44

Views

Apache Kudu TServer goes down when I use CTAS (Create Table As) hence my insertion fails

I have a situation where I have a Table in Cloudera Impala (Parquet Format), The table statistcs are: Size: 23GB Rows: 67M RowSize: Approx 5KB Columns: 308 My Cloudera is Total 6 Nodes Cloudera Cluster (Disk : 84TB Each, Ram: 251GB Each) Kudu Master and Tablet Server 2 Master Nodes, 5 Tablet Servers...
Shahab Niaz
1

votes
0

answer
86

Views

How to install latest version of Apache Impala on AWS EMR?

how can I install Impala on AWS EMR ? Is there any way to install Impala on AWS? Is there any bootstrap script which can install latest version of Impala on AWS EMR?
AKSHAY SHINGOTE
1

votes
1

answer
144

Views

How to load data to Hive table and make it also accessible in Impala

I have a table in Hive: CREATE EXTERNAL TABLE sr2015( creation_date STRING, status STRING, first_3_chars_of_postal_code STRING, intersection_street_1 STRING, intersection_street_2 STRING, ward STRING, service_request_type STRING, division STRING, section STRING ) ROW FORMAT SERDE 'org.apache.hadoop....
mdivk
1

votes
0

answer
46

Views

Do I need to do compute stats after insert overwrite in impala

Suppose I have done create table and compute stats before, I do it behind. create table xxx; insert table xxx; compute stats xxx; And I can use describe formatted this table to find something useful.
Jiaxiang
1

votes
0

answer
31

Views

Is using a timestamp field with concat(to_date) the most efficient way to query previous day in Impala?

I am querying data from HDFS using Impala in a python script using the python library Impyla. The specific data is proxy data and there is tons of it. I have a script that runs daily to pull the previous day and runs statistics. Currently I am using the devicereceipttime field for this query whic...
sectechguy
1

votes
0

answer
57

Views

Connection timeout expired while connecting to impala with impala JDBC Driver

I am using impala2.12.0-cdh5.16.1 and connecting to impala with impala_jdbc_2.6.4.1005. Normally it runs very well, but when I run distcp (which cost the Cluster Network IO and HDFS IO), the java program may throw errors. 2019/02/28 12:54:26 531873 ERROR run.QihooStatusTask(run:88) - [Cloudera][Imp...
Y.Zhang
1

votes
1

answer
39

Views

SubQuery works in IMPALA but not HIVE

I'm trying to understand why the following subquery will work in Impala and not Hive. select * from MySchema.MyTable where identifier not in (select identifier from schema.table where status_code in (1,2,3)); EDIT: Added the error Error while compiling statement: FAILED: SemanticException [Error 1...
DukeLuke
1

votes
1

answer
21

Views

Hive Query : To calculate max indicator value based on priority and date

I tried to frame the query but somehow not getting the required result hence posting. I am new to hive. Apologies if it is very simple. Source Data : Ik - priority - ind1 - ind2 - date 1 - A - y - n - 2009/01/01 1 - B - n - y - 2019/02/09 1 - C -...
Queryguy
1

votes
2

answer
16

Views

Impala SQL query group by with multiple conditions

Given the following situation: CREATE TABLE IF NOT EXISTS `table1` ( `time` int(11) NOT NULL, `aircraft` varchar(50) NOT NULL, `height` int(11) NOT NULL ); INSERT INTO `table1` (`time`, `aircraft`, `height`) VALUES (1, 'klm', 605), (2, 'klm', 603), (3, 'klm', 705), (6, 'klm', 505), (1, 'klm2', 601),...
user2511309
1

votes
0

answer
12

Views

How do global variables of UDFs written in Java act in Cloudera Impala?

I have an UDF written in Java which propagates last non null value through rows ordered by row_number only if actual value is 9. Those values can make distinction between different components. For example: Row number | Component | Value --------------------------------- 1 1 3 2...
LSG
1

votes
1

answer
11

Views

Unable to get Impala JDBC connection in Jboss6

I am trying to get the impala jdbc connection in an application deployed on Jboss 6; below is the spring bean (ID1) datasource definition for same. I am getting the exception while connecting; please refer to the exception below. The below exception occurs only when the application also tries to c...
MyStack

View additional questions