Questions tagged [apache-kudu]

0

votes
0

answer
5

Views

Using Slick with Kudu/Impala

Kudu tables can be accessed via Impala thus its jdbc driver. Thanks to that it is accessable via standard java/scala jdbc api. I was wondering if it is possible to use slick for it. Or if not is any other high level scala db framework supporting impla/kudu.
abalcerek
1

votes
0

answer
493

Views

kerberos authentication in Kudu for spark2 job

I am trying to put some data in kudu, but the worker cannot find the kerberos token, so I am not able to put some data into the kudu database. here you can see my spark2-submit statement spark2-submit --master yarn 'spark.yarn.maxAppAttempts=1' --conf 'spark.authenticate=true' --deploy-mode cluster...
Lukas
1

votes
1

answer
206

Views

How to insert data from Kafka to Kudu using Spark streaming

I have a Spark streaming application that listens to a Kafka topic. When getting the data I need to process it and send to Kudu. Currently I am using org.apache.kudu.spark.kudu.KuduContext API and call the insert action with the data frame. In order to create the data frame from my data I need to ca...
LubaT
1

votes
2

answer
89

Views

How to find KUDU master name or port in which KUDU DB in my cloudera cluster?

I am trying to write a Spark dataframe to Kudu DB, but I do not know the Kudu master. The cluster I am using is a Cloudera cluster. How do I find Kudu master in the cluster?
Karthik reddy
1

votes
0

answer
96

Views

Kudu Client fails with exceptions after running for few days

I have a Scala/Spark/Kafka process that I run. When I first start the process I create a KuduClient Object using a function I made that I share between classes. For this job I only create the KuduClient once, and let the process run continuously. I've noticed that after several days I frequently get...
alex
1

votes
0

answer
44

Views

Apache Kudu TServer goes down when I use CTAS (Create Table As) hence my insertion fails

I have a situation where I have a Table in Cloudera Impala (Parquet Format), The table statistcs are: Size: 23GB Rows: 67M RowSize: Approx 5KB Columns: 308 My Cloudera is Total 6 Nodes Cloudera Cluster (Disk : 84TB Each, Ram: 251GB Each) Kudu Master and Tablet Server 2 Master Nodes, 5 Tablet Servers...
Shahab Niaz
1

votes
0

answer
56

Views

How I measure the size of kudu,s table?

I am starting to work with kudu and the only way to measure the size of a table in kudu is throw the Cloudera Manager - KUDU - Chart Library - Total Tablet Size On Disk Across Kudu Replicas. There are another way to know it throw command line?
Skiel
1

votes
0

answer
19

Views

How to get a range of rows(e.g. 1000th~2000th rows) with Apache Kudu?

I'm using Apache Kudu for study, but how can I get a specific range of rows? For example, I want to get the 1000th to the 2000th rows. I have found some client APIs about search bound with key: Status AddLowerBound(const KuduPartialRow& key); Status AddExclusiveUpperBound(const KuduPartialRow& key)...
Ming Zhang
1

votes
1

answer
21

Views

To install kudu do we required java to be installed?

To install the Apache kudu do we required java as prequisite? i am planning to install kudu in separate VM what are all the prequisite
Buvi
1

votes
1

answer
785

Views

Load a text file into Apache Kudu table?

How do you load a text file to an Apache Kudu table? Does the source file need to be in HDFS space first? If it doesn't share the same hdfs space as other hadoop ecosystem programs (ie/ hive, impala), is there Apache Kudu equivalent of: hdfs dfs -put /path/to/file before I try to load the file?
boethius
1

votes
1

answer
768

Views

How to write and update by kudu API in Spark 2.1

I want to write and update by Kudu API. This is the maven dependency: org.apache.kudu kudu-client 1.1.0 org.apache.kudu kudu-spark2_2.11 1.1.0 In the following code, I have no idea about KuduContext parameter. My code in spark2-shell: val kuduContext = new KuduContext('master:7051') Also the same...
Autumn
1

votes
1

answer
347

Views

sqoop syntax to import to kudu table

We'd like to test Kudu and need to import data. Sqoop seems like the correct choice. I find references that you can import to Kudu but no specifics. Is there any way to import to Kudu using Sqoop?
Jay
1

votes
1

answer
0

Views

Using parts of the primary key to improve searching in KUDU

I have a primary key composed of three columns (id_grandparent, id_parent, id_row) which is residing in KUDU. I want my lookups to be fast (hbase-like) when looking by id_grandparent. I'm using Impala and Spark to do lookups, let's assume both of them do the predicate pushdown on equality. I have so...
BiS
1

votes
1

answer
191

Views

How to test spring batch step which reads from database and writes into a file?

I would like to know what would be the best approach to test the below scenario in a Spring Batch job: A job consisting of two steps: 1) The first step reads from a database using an ItemReader (from apache kudu using impala) and writes into a file the content generated by the query. That itemReader...
1

votes
0

answer
29

Views

Insert into table KUDU by datastage

I am writing to enquire about a problem in my process. I have a Kudu table. When I try to insert by datastage (11.5 or 11.7) a new row where the size is bigger than 500 character using Impala JDBC Driver I receive this error: Fatal Error: The connector failed to execute the statement: INSERT INTO d...
Stephane de Paula
3

votes
0

answer
52

Views

Impala concurrent query delay

My cluster configuration is as follows: 3 Node cluster 128GB RAM per cluster node. Processor: 16 core HyperThreaded per cluster node. All 3 nodes have Kudu master and T-Server and Impala server, one of the node has Impala catalogue and Impala StateStore. My issues are as follows: 1) I've a hard time...
Prog_G
2

votes
1

answer
1k

Views

Apache Kudu vs InfluxDB on time series data for fast analytics

How does Apache Kudu compare with InfluxDB for IoT sensor data that requires fast analytics (e.g. robotics)? Kudu has recently released v1.0 I have a few specific questions on how Kudu handles the following: Sharding? Data retention policies (keeping data for a specified number of data points, or ti...
2

votes
0

answer
256

Views

Impala KUDU table - howto bulk update

I need to performing updates of KUDU table, Is there any option to du update in bulk? The flow is following: 1 .Fetch 1000 rows 2. Process rows, calculate new value for each row 3. Update KUDU table with new values Updating row by row with one DB query per row - slow. I am seeking bulk update soluti...
Yuriy Homyakov
2

votes
0

answer
35

Views

Zeppelin\jupyter Notebook for KUDU

We are trying to connect Zeppelin Notebook to KUDU via impala. We didn't find any existing KUDU interpreters in addition we tried to find impala interpreters. Any help would be appreciated Rony
ron
2

votes
2

answer
1.1k

Views

How to access to apache kudu table created from impala using apache spark

I downloaded the quickstart VM of apache kudu and I have followed the examples just like they appears in this page https://kudu.apache.org/docs/quickstart.html, in fact I created the table named 'sfmta' but when I tried to to access to the kudu table using spark-shell with the following sentence: va...
Joseratts
0

votes
0

answer
242

Views

Kudu client has already been closed in spark streaming

I want to read and write kudu in spark streaming (doc), but failed. It's the code as following: val sparkConf = new SparkConf().setAppName('DirectKafka').setMaster('local[*]') val ssc = new StreamingContext(sparkConf, Seconds(2)) val messages = KafkaUtils.createDirectStream('') messages.foreachRD...
Autumn
0

votes
1

answer
64

Views

Hadoop Key-Value store with remote deploy

My application is launched from remotely pc via spark-submit in yarn-cluster mode with Kerberos keytab and principals by this guide: https://spark.apache.org/docs/latest/running-on-yarn.html. The advantages of this approach are that I have my own version of the spark at any cluster. Is it possible t...
Andrei Iatsuk