Giorgos Myrianthous

1

votes
1

answer
402

views

Maximum retries and retry interval for Kafka JDBC Sink Connector when a database is down

I am trying to test and evaluate the behavior of a Kafka JDBC Sink Connector when the database is down. When a new message is received in Kafka while the database is down, the following error is reported: INFO Unable to connect to database on attempt 1/3. Will retry in 10000 ms. (io.confluent.conn...
1

votes
1

answer
1k

views

RDD[Vector] to dataframe

I have an instance of a RowMatrix that contains a single column. I am trying to turn this RowMatrix into a dataframe but I am not quite sure how to convert org.apache.spark.rdd.RDD[org.apache.spark.mllib.linalg.Vector] to a dataframe. val mat: RowMatrix = new RowMatrix(centred) val mat_rows = mat....
1

votes
1

answer
2.4k

views

Connecting a MySQL database to Apache Superset

I am trying to connect a MySQL database to Apache Superset but the following error is reported: sqlalchemy.exc.OperationalError: (_mysql_exceptions.OperationalError) (1045, 'Access denied for user 'supersetuser'@'localhost' (using password: YES)') I am using MAMP with MySQL running locally on port...
Giorgos Myrianthous
1

votes
1

answer
306

views

Error when trying to create a topic on Apache Kafka

I am trying to create a kafka topic using $KAFKA_HOME/bin/kafka-topics.sh --create --zookeeper localhost:2181 --replication-factor 1 --partitions 1 --topic testing but the following error is reported: Error while executing topic command : Not all brokers have rack information. Add --disable-rack-a...
Giorgos Myrianthous
1

votes
1

answer
1.6k

views

“Error registering Avro schema” when trying to stream data to Kafka

I am trying to reproduce the Serializer example found in Confluent's official documentation and stream data in avro format to a kafka topic. Here's the code: import org.apache.avro.Schema; import org.apache.avro.generic.GenericData; import org.apache.avro.generic.GenericRecord; import org.apache.k...
Giorgos Myrianthous
1

votes
4

answer
60

views

How to center a component in a View

I have a LoginForm component within a View export default class App extends Component { render() { return ( ); } } and I would like to center the LoginForm but flex: 1, justifyContent: 'center' don't seem to be working. What am I missing here? Note that LoginForm is rendering some other components....
Giorgos Myrianthous
1

votes
1

answer
514

views

Fitting pipeline and processing the data

I've got a file that contains text. What I want to do is to use a pipeline for tokenising the text, removing the stop-words and producing 2-grams. What I've done so far: Step 1: Read the file val data = sparkSession.read.text('data.txt').toDF('text') Step 2: Build the pipeline val pipe1 = new Token...
Giorgos Myrianthous
1

votes
1

answer
382

views

Kafka Connect Out of Java heap space after enabling SSL

I have recently enabled SSL and tried to start Kafka connect in distributed mode. When running connect-distributed connect-distributed.properties I get the following errors: [2018-10-09 16:50:57,190] INFO Stopping task (io.confluent.connect.jdbc.sink.JdbcSinkTask:106) [2018-10-09 16:50:55,471] ER...
Giorgos Myrianthous
1

votes
1

answer
281

views

KSQL join returns null fields

I am trying to join the following stream Field | Type --------------------------------------- ROWTIME | BIGINT (system) ROWKEY | VARCHAR(STRING) (system) USER_ID | VARCHAR(STRING) FIRSTNAME | VARCHAR(STRING) --------------------------------------- with the following table: Fi...
1

votes
2

answer
128

views

Retrieve cookie and send it in subsequent POST requests

I want to read two numbers (randomly generated) from a website which are then used in order to compute a result and then submit the result using a POST request. To do so, I will also need to submit the cookie of that session so that the system is aware of the random numbers which have been produced...
1

votes
1

answer
1k

views

How to transform and extract fields in Kafka sink JDBC connector

I am using a 3rd party CDC tool that replicates data from a source database into Kafka topics. An example row is shown below: { 'data':{ 'USER_ID':{ 'string':'1' }, 'USER_CATEGORY':{ 'string':'A' } }, 'beforeData':{ 'Data':{ 'USER_ID':{ 'string':'1' }, 'USER_CATEGORY':{ 'string':'B'...
0

votes
0

answer
3

views

Schema Registry fails to start after enabling HTTPS

I want to upgrade Schema Registry to allow REST API calls over HTTPS in an existing cluster and to do so I followed Confluent's documentation My configuration for both Schema Registry instances looks like below: listeners=https://0.0.0.0:8081 kafkastore.connection.url=kafka-host-1:2181,kafka-host-2...
Giorgos Myrianthous
1

votes
1

answer
884

views

How to check which zookeeper instance is the leader within an ensemble

Assume that I have an ensemble Zookeeper which is up and running to facilitate and serve Apache Kafka (Confluent's distribution). 3 instances (clientPorts: 2181, 2182 and 2183) have been configured and started as shown below: ./bin/zookeeper-server-start etc/kafka/zookeeper.properties ./bin/zookeep...
Giorgos Myrianthous
4

votes
1

answer
399

views

Stream delete events from MySQL to PostgreSQL via Apache-kafka

I am trying to stream events from MySQL to PostgreSQL using Apache Kafka. Although insertions and updates work fine, I can't figure out how to delete a record from MySQL and stream this event to PostgreSQL. Assume the following topology: +-------------+ | | | MySQL | |...
Giorgos Myrianthous
0

votes
0

answer
8

views

Secure Schema Registry API with SSL

I am trying to enable SSL on Schema Registry and would like to secure Schema Registry API too. According to the Confluent's documentation the following properties need to be configured in order to enable authentication and encryption using SSL listeners=http://0.0.0.0:8081 kafkastore.connection.url=...
Giorgos Myrianthous
3

votes
1

answer
709

views

sklearn: Naive Bayes classifier gives low accuracy

I have a dataset which includes 200000 labelled training examples. For each training example I have 10 features, including both continuous and discrete. I'm trying to use sklearn package of python in order to train the model and make predictions but I have some troubles (and some questions too)....
Giorgos Myrianthous
3

votes
1

answer
282

views

Apply PCA and keep a percentage of the total variance

I want to perform Principal Component Analysis on a particular dataset and then feed the principal components to a LogisticRegression classifier. Specifically, I want to apply PCA and keep the 90% of the total variance, using the function computePrincipalComponentsAndExplainedVariance. Here's the c...
1

votes
3

answer
1.5k

views

How to print all the columns of a Matrix

I have a Matrix that contains 5 columns in total. What I want to do is to print all the columns of the Matrix and not just the first 2 as shown below: val V: Matrix = svd.V // The V factor is a local dense matrix. println(V) gives the following output: -1.0237272594782074E-4 -1.7078345817841522E-4...
3

votes
0

answer
224

views

How to set consumer.id for new Kafka consumer

According to the docs, Old Consumer Configs included a consumer.id set to null by default: consumer.id Default: null Description: Generated automatically if not set. Is it possible to set the consumer.id for the New Kafka Consumer and if so, how can I achieve this?
7

votes
2

answer
730

views

Combinations of two lists (not element-wise)

I have two lists: a = ['a', 'b'] b = [1, 2, 3] I want to get the combinations produced between the elements of list b and the elements of list a but treating elements of a as pairs (or triples etc. etc.) as the example below which gives len(b) ** len(a) number of combinations. c = ['a_1 b_1', 'a_1 b...
1

votes
2

answer
776

views

Remove rows with missing values denoted by '?'

I have a .csv file that contains rows with missing values. Those values instead of null, are denoted by the character ?. How can I remove the rows that contain at least one column with value ?, given that df.na.drop() won't work (since the missing values are not null) ? The data looks like below (I'...
1

votes
1

answer
467

views

Error when trying to load JDBC sink connector

I am trying to stream data from a Kafka topic to a MySQL database unsuccessfully. Although the source connector works fine (i.e. streaming data from a MySQL database to kafka topic), sink connector fails to load. Here is my sink-mysql.properties file: name=sink-mysql connector.class=io.confluent.co...
Giorgos Myrianthous
1

votes
1

answer
293

views

Error when trying to join a table and a stream

I am trying to join a table and a stream and create another table as shown below: CREATE TABLE table_fx_latest AS SELECT t1.currencyid, t1.maxtimestamp, t2.midprice FROM stream_fx2 t2 LEFT JOIN table_fx_latest3 t1 ON t1.currencyid = t2.currencyid AND t1.timestamp = t2.maxtimestamp GROUP BY t1.cu...
Giorgos Myrianthous
1

votes
1

answer
599

views

How to join two Kafka streams and produce the result in a topic with Avro values

I've got two Kafka Streams with keys in String and values in Avro format which I have created using KSQL. Here's the first one: DESCRIBE EXTENDED STREAM_1; Type : STREAM Key field : IDUSER Timestamp field : Not set - using Key format : STRING Value format...
2

votes
2

answer
310

views

How to concatenate a coo_matrix with a column numpy array

I have a coo_matrix a with shape (40106, 2048) and a column numpy array b with shape (40106,). What I want to do is to simply concatenate the matrix and the array (i.e. the resulting data structure will have shape (40106, 2049) ). I've tried to use hstack as shown below concat = hstack([a, b]) but...
1

votes
2

answer
43

views

Get the maximum N elements (along with their indices) of an Array

I've got an array that contains Integers as the one shown below: val my_array = Array(10, 20, 6, 31, 0, 2, -2) I need to get the maximum 3 elements of this array along with their corresponding indices (either using a single function or two separate funcs). For example, the output might be something...
4

votes
1

answer
343

views

'START WITH' equivalent expression in MS-SQL

Oracle SQL supports START WITH expression. For instance, CREATE VIEW customers AS SELECT LEVEL lvl, customer_code, customer_desc, customer_category FROM customers_master START WITH some_column = '0' CONNECT BY PRIOR CUSTOMER_CODE = PARENT_CUSTOMER_CODE; If a table contains hierarchical data, then...
3

votes
2

answer
140

views

Compute possible combinations of an equation

I've got a template of equations template = (a op b) op c where the available operations op are the following: op = ['+', '-', '/', '*']. What I want to do is to compute all the possible combinations of this equation using the operations op and the parameters a, b and c. e.g. (a + b) + c (a - b)...
2

votes
1

answer
189

views

How to run two instances of schema registry

I am trying to run Kafka in cluster mode using two instances of schema registry but I am not quite sure how to configure the second instance so that it takes over in case the first one is down. Here's the properties file for the first schema-registry instance: port=8081 # The address the socket ser...
1

votes
2

answer
767

views

Kafka retention policies

Assume that I have a multi-broker (running on the same host) Kafka setup with 3 brokers and 50 topics each of which is configured to have 7 partitions and replication factor of 3. I have 50GB of memory to spend for kafka and make sure that Kafka logs will never exceed this amount of memory so I wan...
3

votes
0

answer
494

views

Text classification with Decision Trees and spark Dataframes

I have a dataset that contains a number of reviews and their corresponding labels (either positive or negative) and I want to extract features and build a pipeline to perform binary text classification using Decision trees. The problem is that I probably presenting the data to the classifier, in th...
2

votes
1

answer
41

views

Kafka Consumer rounds up decimal in string format

I have a topic in avro format that contains data similar to the one below: { 'data':{ 'NAME':{ 'string':'GIORGOS' }, 'BALANCE':{ 'string':'154711.5800000000' } }, 'op':'INSERT' } and a Kafka Consumer which is consuming data from this topic (My sink connector contains some transformations so...
Giorgos Myrianthous
5

votes
2

answer
307

views

Unable to run a JDBC Source connector with Confluent REST API

I want to run a JDBC source connector using Confluent's REST API. Although stand-alone mode works perfect using the following properties file: name=source-mysql-test connector.class=io.confluent.connect.jdbc.JdbcSourceConnector tasks.max=1 connection.url=jdbc:mysql://localhost:3306/kafka connectio...
8

votes
2

answer
2.6k

views

How to delete multiple topics in Apache Kafka

Assume that I have a number of topics with the same prefix, e.g: giorgos-topic1 giorgos-topic2 giorgos-topic3 ... The command used for deleting a single topic (say giorgos-topic1) is the following: ./bin/kafka-topics.sh --zookeeper localhost:2181 --delete --topic giorgos-topic1 Is it possible to del...
Giorgos Myrianthous