Questions tagged [apache]

72366 questions
0

votes
1

answer
11

Views

How does Spring Kafka/Spring Cloud Stream guarantee the transactionality / atomicity involving a Database and Kafka?

Spring Kafka, and thus Spring Cloud Stream, allow us to create transactional Producers and Processors. We can see that functionality in action in one of the sample projects: https://github.com/spring-cloud/spring-cloud-stream-samples/tree/master/transaction-kafka-samples: @Transactional @StreamListe...
codependent
0

votes
0

answer
4

Views

Writing to DSE graph from EMR

We are trying to write to write to a DSE graph (cassandra) from EMR and keep getting these errors. My JAR is a shaded jar with the byos dependencies. Any help would be appreciated. java.lang.UnsatisfiedLinkError: org.apache.cassandra.utils.NativeLibraryLinux.getpid()J at org.apache.cassandra.utils.N...
mat77
1

votes
2

answer
484

Views

PySpark Structured Streaming: Pass output of Query to API endpoint

I have the following dataframe in Structured Streaming: TimeStamp|Room|Temperature| 00:01:29 | 1 | 55 | 00:01:34 | 2 | 51 | 00:01:36 | 1 | 56 | 00:02:03 | 2 | 49 | I am trying to detect when temperatures fall below a certain temperature (50 in this case). I have t...
DataGeek
1

votes
1

answer
550

Views

Right Alignment of numerical value in PDF using Apache PDFBox library

My Spring boot Java application is using apache pdf box library {version 2.0.6} for generating pdf. I want decimal values to be right aligned. It means all decimal dot should align in same vertical line. I also attached screenShot. stream.beginText(); stream.newLineAtOffset(xCordinate, yCordinate);...
1

votes
1

answer
848

Views

EXCEPT on Specific columns Spark 1.6

I'm trying to filter out rows from dfA using dfB. dfA: +----+---+----+------------+-----+ |year|cid|X| Y|Z| +----+---+----+------------+-----+ +----+---+----+------------+-----+. dfB: +----+---+ |year|cid| +----+---+ +----+---+ My goal is to fillter all couples year cid in dfB from dfA. I see...
RefiPeretz
1

votes
1

answer
728

Views

How do I select an ambiguous column reference? [duplicate]

This question already has an answer here: Enable case sensitivity for spark.sql globally 1 answer Here's some sample code illustrating what I'm trying to do. There is a dataframe with columns companyid and companyId. I want to select companyId, but the reference is ambiguous. How do I unambiguously...
chris.mclennon
1

votes
1

answer
789

Views

Is disposing SXSSFWorkbook necessary when used in try with resource

Below is the sample code snippet to create SXSSFWorkbook: try(SXSSFWorkbook wb = new SXSSFWorkbook()) { //... } finally { wb.dispose(); //wb not accessible over here, so can't use try with resource } Here problem is that if I use try with resource then can't dispose() SXSSFWorkbook in finally, as va...
eatSleepCode
1

votes
1

answer
684

Views

Consuming RESTful API and converting to Dataframe in Apache Spark

I am trying to convert output of url directly from RESTful api to Dataframe conversion in following way: package trials import org.apache.spark.sql.SparkSession import org.json4s.jackson.JsonMethods.parse import scala.io.Source.fromURL object DEF { implicit val formats = org.json4s.DefaultFormats ca...
Utkarsh Saraf
1

votes
2

answer
2k

Views

How to increase Maximum Memory Pool Size? Apache Tomcat 9

I am trying to increase my Maximum memory pool in Apache tomcat config. I am using Geo-server services with the help of Apache, however my memory in Geo-server is filling very fast. All the time I have to free memory from Geo-server Server Status. I have to increase my memory to 2048 maximum by sto...
1

votes
1

answer
1.7k

Views

Spark Dataframe to Kafka

I am trying to stream the Spark Dataframe to Kafka consumer. I am unable to do , Can you please advice me. I am able to pick the data from Kafka producer to Spark , and I have performed some manipulation, After manipulating the data , I am interested to stream it back to Kafka (Consumer).
1

votes
3

answer
4k

Views

How to create an empty dataFrame in Spark

I have a set of Avro based hive tables and I need to read data from them. As Spark-SQL uses hive serdes to read the data from HDFS, it is much slower than reading HDFS directly. So I have used data bricks Spark-Avro jar to read the Avro files from underlying HDFS dir. Everything works fine except w...
Vinay Kumar
1

votes
1

answer
926

Views

Spring kafka and Kafka Cluster

I've configured 3 kafka's in cluster and I'm trying to use with spring-kafka. But After I kill the kafka leader I'm not able to send other messages to queue. I'm setting the spring.kafka.bootstrap-servers property as: 'kafka-1:9092;kafka-2:9093,kafka-3:9094' and all of names in my hosts file. Kafka...
0

votes
0

answer
4

Views

Spark-Scala Log Rotation Issue, Unable to create External Log

Spark-Scala Log Rotation Issue, Unable to create External Log : Unable to create log rotation using RollingFileAppender & ConsoleAppender: i have received below WARN during execution, unable to create any log file in following location. Still using default log file. ++++++++++++++++++++++++++++++++...
Srini K
1

votes
1

answer
775

Views

Apache Nifi - ConvertJSONToSQL - JSON Does not have a value for the required column

I am trying to experiment with a tutorial I came across online, and here is its template: While the template ended with converting CSV to JSON, i want to go ahead and dump this into a MySQL Table. So i create a new processor 'ConvertJSONToSQL'. Here are its properties: And these are the controller s...
Shashank
1

votes
1

answer
490

Views

Spark dataframe calculate the row-wise minimum [duplicate]

This question already has an answer here: Comparing columns in Pyspark 4 answers Row aggregations in Scala 1 answer I'm trying to put the minimum value of a few columns into a separate column. (Creating the min column). The operation is pretty straight forward but I wasn't able to find the right f...
1

votes
2

answer
374

Views

How to check if key exists in spark sql map type

So I have a table with one column of map type (the key and value are both strings). I'd like to write spark sql like this to check if given key exists in the map. select count(*) from my_table where map_contains_key(map_column, 'testKey') I couldn't find any existing spark sql functions that can do...
seiya
1

votes
2

answer
138

Views

How to wait for GenerateTableFetch queries to finish

My use case is like this. I have some X tables to be pulled from MySQL. I am splitting them using SplitText to put each table in a individual flow file and pull using GenerateTableFetch and ExecuteSQL. And I want to be notified or put some other action when import is done for all the tables. At Spl...
pratpor
1

votes
2

answer
698

Views

Spark collect_list and limit resulting list

I have a dataframe of the following format: name merged key1 (internalKey1, value1) key1 (internalKey2, value2) ... key2 (internalKey3, value3) ... What I want to do is group the dataframe by the name, collect the list and limit the size of the list. This is how i group by the name...
pirox22
1

votes
1

answer
116

Views

Connecting to sql database from Spark

I am trying to connect to a SQL database from spark and I have used the following commands: scala> import org.apache.spark.sql.SQLContext...
Mahadevan Swamy
1

votes
1

answer
918

Views

Kafka producer difference between flush and poll

We have a Kafka consumer which will read messages and do so stuff and again publish to Kafka topic using below script producer config : { 'bootstrap.servers': 'localhost:9092' } I haven't configured any other configuration like queue.buffering.max.messages queue.buffering.max.ms batch.num.messages I...
shakeel
1

votes
1

answer
198

Views

What Happens when there is only one partition in Kafka topic and multiple consumers?

I have a Kafka Topic with only one partition and I am not getting what will happen in following cases? How messages will be delivered to consumers? If all consumers are in same group If all consumers are in different group I am not sure if consumers will receive unique messages or duplicate ones.
hard coder
1

votes
2

answer
274

Views

Unable to run flink example program,Connection refused

./bin/flink run examples/streaming/SocketWindowWordCount.jar --port 9000 According to the official QuickStart directly run the example program.Log as fololws.The reason seems to be java.net.ConnectException.I'm sure port is not being used and firewall is closed. [email protected]:/home/maple/Downloads/f...
maple
1

votes
1

answer
138

Views

Apache Nifi Expression Language - toDate formatting

I am trying to format a date string using the Apache Nifi expression language and the Replace Text processor(regex). Given a date string date_str : '2018-12-05T11:44:39.717+01:00', I wish to convert this to: correct_mod_date_str: '2018-12-05 10:44:39.717', (notice how the date is converted to UTC,...
irrelevantUser
1

votes
2

answer
63

Views

Any idea for dynamic scaling flink job?

If there is a kafka topic with 10 partitions and we'd like to use flink to consume the topic. We want the system to allocate slots dynamically according to workload, which means if the workload is low, the flink job can use less slots(with less parallelism), and if the workload is high it can run wi...
snakie yu
0

votes
0

answer
4

Views

Is there update operator support on MongoDB connector for Spark?

Using: Spark / MongoDB connector for Spark / Scala Hi Is there update operators support on MongoDB connector for Spark? I'd like to use update docuemnt query(updateOne query) with $inc, $currentDate, $setOnInsert update operators and update option(like upsert = true). I already did with mongo-scala...
gingermanjh
-1

votes
3

answer
23

Views

Spark - Map flat dataframe to a configurable nested json schema

I have a flat dataframe with 5-6 columns. I want to nest them and convert it into a nested dataframe so that I can then write it to parquet format. However, I don't want to use case classes as I am trying to keep the code as configurable as possible. I'm stuck with this part and need some help. My...
devnong
1

votes
1

answer
218

Views

Kafka Connect JDBC Sink Connector

I am trying to write data from a topic (json data) into a MySql Database. I believe I want a JDBC Sink Connector. How do I configure the connector to map the json data in the topic to how to insert data into the database. The only documentation I can find is this. 'The sink connector requires knowl...
Chris
1

votes
1

answer
47

Views

Accessing information (Metadata) in the file name & type in a Beam pipeline

My filename contains information that I need in my pipeline, for example the identifier for my data points is part of the filename and not a field in the data. e.g Every wind turbine generates a file turbine-loc-001-007.csv. e.g And I need the loc data within the pipeline.
Reza Rokni
1

votes
1

answer
63

Views

Node.js + Apache - https://localhost:3000/socket.io/ ERR_CONNECTION_REFUSED

I am implementing a simple game in Node.js. I have a client.js for my client side code, and a server.js running on a remote server, which both use sockets to communicate on port 3000 I am also running Apache on port 80, and using ProxyPass in my apache configuration file, to route the url mywebsite....
Barney Chambers
1

votes
4

answer
53

Views

Can't set Kafka retention policy to both compact and delete

According to the below links I'm supposed to be able to set the retention.policy value to 'compact,delete'. https://issues.apache.org/jira/browse/KAFKA-4015 https://cwiki.apache.org/confluence/display/KAFKA/KIP-71%3A+Enable+log+compaction+and+deletion+to+co-exist However when I try to alter the rete...
emirhosseini
1

votes
1

answer
31

Views

How to replace the content of a flow file that exists between [ and ] in Nifi?

I wanted to remove the entire content between the brackets of a flow file attribute. Attached is my sample Flow file and in which I wanted to remove the content between [ and ]. May I know the search and replacement value to be used from ReplaceText Processor ?Flow File content
SPK
1

votes
2

answer
43

Views

Access AWS services from Apache Nifi running on AWS

I have a Nifi instance running on an EC2 machine and I am trying to access a restricted s3 bucket. Because generating access keys manually is not recommended, I want to give the machine the proper IAM role for accessing the outside bucket. I gave the EC2 machine a role which seems to work for every...
Ethan McCue
1

votes
2

answer
41

Views

Exchange files (up to many GB)

For my project, I have to create a file manager which aims at storing many files (from many locations) and exposing URL to download them. In a micro-service ecosystem (I am used to use spring boot), I wonder what is the best way to exchange such files, I mean sending files to file manager? On a o...
OlivierTerrien
1

votes
1

answer
43

Views

How to use Aws Temporary credentials in Nifi

I have to use aws temporary credentials AccessKey, SecretKey and Token within nifi process to access S3 objects. AccessKey, SecretKey and Token will be provided by an Api call. How to use these temperory credentials in nifi ListS3 Object etc? One of the options I found is using AWSCredentialsProvid...
Ani
1

votes
1

answer
27

Views

How to properly apply HashPartitioner before a join in Spark?

To reduce shuffling during the joining of two RDDs, I decided to partition them using HashPartitioner first. Here is how I do it. Am I doing it correctly, or is there a better way to do this? val rddA = ... val rddB = ... val numOfPartitions = rddA.getNumPartitions val rddApartitioned = rddA.partiti...
MetallicPriest
1

votes
1

answer
47

Views

How to trigger camel route from client request?

I have this route: from('timer://test?repeatCount=1').routeId('newRoute') .streamCaching() .process(exchange -> exchange.getIn() .setBody(exchange.getIn() .getBody())) .marshal() .json(JsonLibrary.Jackson) .setHeader(Exchange.HTTP_METHOD, constant('GET')) .setHeader(Exchange.CONTENT_TYPE, constant('...
Bambus
1

votes
2

answer
40

Views

Sql Between Clause with multiple columns

I am trying to get the dates between JAN 2016 and March 2018. I have used the below query. But it is not returning proper results. The input columns will come at runtime. We cannot convert it into the date as the input columns can contain quarter. SELECT Year,Month,DayOfMonth FROM period WHERE (Y...
dassum
1

votes
2

answer
43

Views

Kafka consumer is reading last committed offset on re-start (Java)

I have a kakfa consumer for which enable.auto.commit is set to false. Whenever I re-start my consumer application, it always reads the last committed offset again and then the next offsets. For ex. Last committed offset is 50. When I restart consumer, it again reads offset 50 first and then the next...
svbp
1

votes
2

answer
79

Views

How to expose a headless service for a StatefulSet cassandra cluster externally in Kubernetes

I'm setting up multi node cassandra cluster in kubernetes (Azure AKS),Since this a headless service with statefull set pods without having a external IP. How can i connect my spark application with cassandra which is in kubernetes cluster We have tried with cluster ip,ingress ip also but only single...
suraj1287
1

votes
2

answer
33

Views

How to create an unique autogenerated Id column in a spark dataframe

I have a dataframe where I have to generate a unique Id in one of the columns. This id has to be generated with an offset. Because , I need to persist this dataframe with the autogenerated id , now if new data comes in the autogenerated id should not collide with the existing ones. I checked the mon...
Ayan Biswas

View additional questions