Questions tagged [avro]
813 questions
1
votes
1
answer
262
Views
Sqoop: Avro with Gzip Codec failing
When trying to import tables to HDFS using Sqoop with --as-avrodatafile and GzipCodec, it is failing with below exception, I'm running this CDH7 Cloudera quickstart docker image
Is there a reason we cannot use Gzip with Avro or is it some missing configuration that is causing this.
Note: Gzip works...
1
votes
3
answer
4k
Views
How to create an empty dataFrame in Spark
I have a set of Avro based hive tables and I need to read data from them. As Spark-SQL uses hive serdes to read the data from HDFS, it is much slower than reading HDFS directly. So I have used data bricks Spark-Avro jar to read the Avro files from underlying HDFS dir.
Everything works fine except w...
1
votes
2
answer
722
Views
How to convert Avro GenericRecord to a valid Json using while coverting timestamp fields from milliseconds to datetime?
How to convert Avro GenericRecord to Json using while coverting timestamp fields from milliseconds to datetime?
Currently using Avro 1.8.2
Timestamp tsp = new Timestamp(1530228588182l);
Schema schema = SchemaBuilder.builder()
.record('hello')
.fields()
.name('tsp').type(LogicalTypes.timestampMilli...
1
votes
3
answer
821
Views
How can do Functional tests for Kafka Streams with Avro (schemaRegistry)?
A brief explanation of what I want to achieve:
I want to do functional tests for a kafka stream topology (using TopologyTestDriver) for avro records.
Issues: Can't 'mock' schemaRegistry to automate the schema publishing/reading
What I tried so far is use MockSchemaRegistryClient to try to mock the...
1
votes
1
answer
91
Views
Loading Avro Data into BigQuery via command-line?
I have created an avro-hive table and loaded data into avro-table from another table using hive insert-overwrite command.I can see the data in avro-hive table but when i try to load this into bigQuery table, It gives an error.
Table schema:-
CREATE TABLE `adityadb1.gold_hcth_prfl_datatype_accepten...
1
votes
0
answer
90
Views
Standard serialization protocol to serialize set of objects to disk
I am looking for standard protocol that provides ability to serialize set of object (same type) to a file, but also provide easy way to align to object boundary if reader/de-serializer start reading from random byte offset.
After googling I found out that Apache Avro provides this functionality usi...
1
votes
0
answer
8
Views
CreateDirectStream with messages avro
In a first moment, I had to process the information from a text file:
C1_4,C2_4,C1______10,01/12/2015,30/12/2015,123456789,S,12345
Now, I need to process the same information but in format avro. How can I do it ?
Before I used this code:
createDirectStream[String, String, StringDecoder, StringDecode...
1
votes
0
answer
198
Views
“Value” causing issues in schema generation
I have an object like this:
'Meta': {
'Type': 10,
'Key': 'Meta',
'Value': {
1
votes
0
answer
368
Views
Cannot deserialize data using apache avro
I have a spring boot application that sends and receives data from a kafka broker, i'm using apache avro as a SerDer.
What I've done so far is generate the class using maven plugin, the schema is fairly simple:
{'namespace': 'com.domain',
'type': 'record',
'name': 'User',
'fields': [
{'name': 'name...
1
votes
1
answer
219
Views
Avro schema record field name start from number
Avro documentation says:
The name portion of a fullname, record field names, and enum symbols must:
start with [A-Za-z_]
subsequently contain only [A-Za-z0-9_]
Is it possible somehow to escape the first rule and have record field name starting with digit, i.e. 123ColumnName? Maybe via 'escaping' or...
1
votes
2
answer
714
Views
avro json additional field
I have following avro schema
{
'type':'record',
'name':'test',
'namespace':'test.name',
'fields':[
{'name':'items','type':
{'type':'array',
'items':
{'type':'record','name':'items',
'fields':[
{'name':'name','type':'string'},
{'name':'state','type':'string'}
]
}
}
},
{'name':'firstname','type':'stri...
1
votes
0
answer
152
Views
GenericRecord to GenericRecord using sub schema (reader schema) without Encode/Decode
I am trying an efficient way to create a GenericRecord from another GenericRecord using a subschema (reader schema) without using Encoder/Decoder. For example I have the following schemas
String fullSchema = '{\'namespace\': \'example.avro\',\n' +
' \'type\': \'record\',\n' +
' \'name\': \'User\',\...
1
votes
0
answer
153
Views
how to POST Avro bytes to flask endpoint
problem: when a POST request is sent to a flask endpoint where a field consists of raw bytes (Apache Avro format), flask automatically attempts to decode the bytes into unicode which messes up the data.
for example, when a POST request is sent via python test client as follows:
# part of a python un...
1
votes
1
answer
631
Views
How to resolve The datum is not an example of schema (Avro::IO::AvroTypeError)
I am a newbie to Avro with Ruby and basically to programming.
While i was performing some basic stuff on Avro with ruby, I see some issues with the schema.
Below is the code.
require 'rubygems'
require 'avro'
require 'mysql2'
require 'json'
require 'multi_json'
# setup mysql
db = Mysql2::Client....
1
votes
0
answer
107
Views
how to import avro file in ADLS in Azure Analysis Service model?
After the support of ADLS as back end in Azure Analysis service , we want to import avro file from ADLS into Azure Analysis service .
The file size will be in range of TBs ( 2-4 ) .
Azure online documentation tells about csv only and not of avro . Is it possible to import avro from ADLS ( azure da...
1
votes
1
answer
501
Views
Read Avro file with python to create a SQL table
I'm trying to create an SQL table from AVRO file which contains the structure of my table :
{
'type' : 'record',
'name' : 'warranty',
'doc' : 'Schema generated by Kite',
'fields' : [ {
'name' : 'id',
'type' : 'long',
'doc' : 'Type inferred from '1''
}, {
'name' : 'train_id',
'type' : 'long',
'doc' :...
1
votes
1
answer
460
Views
How to get the avro schema from StructType
I have a dataFrame
Dataset dataset = getSparkInstance().createDataFrame(newRDD, struct);
dataset.schema() is returning me a StructType.
But I want the actual schema to store in sample.avsc file
Basically I want to convert StructType to Avro Schema file (.avsc).
any Idea?
1
votes
0
answer
448
Views
How to use Schema Registry and AvroConverter in Kafka Source connector?
I'm trying to write custom Source and Sink Kafka connectors for MongoDB with schema registry and Avro. I have configured the key and value converters as AvroConverter and the schema registry url in the properties. What code do i have to add in my connector so that it converts the data to Avro and v...
1
votes
0
answer
49
Views
How does Schema Registry integrate with Kafka Source Connector?
I have added Topic-Key and Topic-Value schemas for a given topic using REST APIs. In my custom connector, do I need to create a schema again using SchemaBuilder? How do I access the registered schemas inside my connector ?
1
votes
0
answer
226
Views
Using Spark fileStream with Avro Data Input
I'm trying to create a Spark Streaming application using fileStream(). The document guide specified:
streamingContext.fileStream[KeyClass, ValueClass, InputFormatClass](dataDirectory)
I need to pass KeyClass, ValueClass, InputFormatClass. My main question is what can I use for these parameters for...
1
votes
0
answer
74
Views
Schema design in avro and protobufs
Currently a lot of data that we store are in the form of avro records , or serialized protobuf bytes. I want to see how can i design an efficient schema for my data to improve reading/parsing serializing speed by data pipelines.
For example consider the following case
Schema A : Column 1 : columnNa...
1
votes
0
answer
70
Views
How to store data on hdfs using flume with existing schema file
I have json data coming from source and i want to dump it on hdfs using flume in avro format for which i already have avsc file, i am using following configuration for sink but thats not picking my avsc file but creating its own schema :
agent1.sinks.sink1.type = hdfs agent1.sinks.sink1.serializer...
1
votes
0
answer
80
Views
Avro: detect union field types in python
Given the following schema that contains a union of three types:
{
'namespace':'com.example',
'type':'record',
'name':'TestObject',
'fields':[
{
'name':'element',
'type': [
'null',
'string',
{
'name':'element_type',
'type': 'enum',
'symbols': ['TYPE1', 'TYPE2']
}
],
'default':'TYPE1'
}
]
}
Is it pos...
1
votes
0
answer
92
Views
Apache Beam DynamicAvroDestinations DefaultFilenamePolicy with String instead of ResourceId
According to the write example on https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/io/AvroIO.html
The following code should work:
public FilenamePolicy getFilenamePolicy(Integer userId) {
return DefaultFilenamePolicy.fromParams(new Params().withBaseFilename(baseDir + '/us...
1
votes
0
answer
227
Views
.NET Core 2.0 Avro CodeGenerator cannot handle nested structures or IDL
I am trying to use the Microsoft.Hadoop.Avro.Utils library in a C# .NET Core 2.0 application to generate C# class for my Avro schema, but I cannot get it to handle nested records when creating a simple Avro JSON schema.
In general, I would much more like to use the Avro IDL format to describe my dat...
1
votes
0
answer
76
Views
Avro equivalent of Protocol Buffers 'oneof' type
Wanted to know if there is any good Avro type or definition (or combination of) to simulate protobuf's 'oneof' type. I didn't see an easy way to make a field optional, nor to restrict it to one selection of many.
If there's no good equivalent, how would you suggest trying to implement this with Avro...
1
votes
2
answer
2.3k
Views
Unable to generate avro generic record from object
I am trying to send avro records to A Kafka topic using Kafka producer. I have a User class and I am sending object of that class. Below code works fine if i use avroRecord.put(); to set each attribute. But what i want is to create A Generic Record from an object without using avroRecord.put(); for...
1
votes
0
answer
188
Views
Spark kafka avro producer in structured streaming
I have a working UDF that does a side effect, send a kafka message in avro, which I know is not the purpose of a UDF. I could not find a good way to accomplish this, and this works... but I'm wondering if this is a really bad idea. Does someone have a better way of doing this?
#if you don't have a s...
1
votes
0
answer
205
Views
Avro Schema vs. Scala Case Class for Spark Datasets
I am curious if there is any significant performance difference between using schemas defined in Scala case classes versus defining schemas with Apache Avro for Spark Datasets. Currently I have a schema that looks something like this:
root
|-- uniqueID: string (nullable = true)
|-- fieldCount: integ...
1
votes
0
answer
231
Views
Error reading Avro file in Python
I'm trying to read an avro file into Python, and the following code works on OSX and linux box but breaks on a Windows:
from avro.datafile import DataFileReader, DataFileWriter
from avro.io import DatumReader, DatumWriter
reader = DataFileReader(open('my_file.avro', 'rb'), DatumReader())
for line in...
1
votes
0
answer
154
Views
NiFi avro schema using regex to validate a string
I have an avro schema in NiFi which validates the columns of a CSV file, all is working well, however I'd like to ideally have an extra level of validation on certain string column to test that they adhere to specific patterns. For example ABC1234-X, or whatever. Here's the wrinkle though, the avro...
1
votes
0
answer
229
Views
Kafka jdbc sink connector does not correct convert timestamp
i have a schema
{
'type' : 'record',
'name' : 'test',
'namespace' : 'test',
'fields' : [ {
'name' : 'time',
'type': [ 'null', {
'type': 'long',
'logicalType': 'timestamp-millis'
}]
},
....
{
'name' : 'time2',
'type': ['null', {
'type' : 'long',
'logicaltype': 'timestamp-millis'
}]
}
}
But when kaf...
1
votes
1
answer
246
Views
How to write the ValueJoiner when joining two Kafka Streams defined using Avro Schemas?
I am building an ecommerce application, where I am currently dealing with two data feeds: order executions, and broken sales. A broken sale would be an invalid execution, for a variety of reasons. A broken sale would have the same order ref number as the order, so the join is on order ref # and line...
1
votes
0
answer
81
Views
Spark performance is not enhancing
I am using Zeppelin to read avro files of size in GBs and have records in billions. I have tried with 2 instances and 7 instances on AWS EMR, but the performance seems equal. With 7 instances it is still taking alot of times. The code is:
val snowball = spark.read.avro(snowBallUrl + folder + prefix...
1
votes
0
answer
37
Views
Schema validation of multi reference chainied schema
I want to do three things
Validate JSON against a JSON-Schema
Create JSON-Schema to AVRO Schema converter
Create JSON-Schema to Hive Table converter
The problem I'm facing is the Schema has a referencing chain.
I'm trying to use this JSON Schema Validator which resolves reference and validates but...
1
votes
2
answer
60
Views
How is schema from Schema-Registry is propagated over Replicator
How do schemas from Confluent Schema-Registry get propagated by Confluent-Replicator to destination Kafka-Cluster and Schema-Registry?
Is each replicated message schema contained in it or are schemas replicated somehow separately through a separate topic?
I didn't see any configuration possibilities...
1
votes
0
answer
80
Views
Can I use Apache Avro just for JSON documents schema validation?
I know that Avro is very fast data serialisation and deserialisation system. It also provides rich data structure for schema defination. Is it possible to use Avro just for JSON documents schema validation? For example, I have thousands of JSON documents and I want to validate those JSON documents a...
1
votes
0
answer
46
Views
Loading of Apache Avro plugin for Tranquility fails with Exception
For the Kafka Avro producer I run :
./kafka-avro-console-producer --broker-list localhost:9092 --topic pageviews --property value.schema='{'type':'record','name':'mypageviews','fields':[{'name':'time','type':'string'},{'name':'url','type':'string'},{'name':'user','type':'string'},{'name':'latencyMs'...
1
votes
1
answer
924
Views
Spark read avro
Trying to read an avro file.
val df = spark.read.avro(file)
Running into Avro schema cannot be converted to a Spark SQL StructType: [ 'null', 'string' ]
Tried to manually create a schema, but now running into the following:
val s = StructType(List(StructField('value', StringType, nullable = true)))...
1
votes
1
answer
282
Views
Build error using Apache Avro generated files
I a using Gradle to build a simple app using Apache Avro. Following are the relevant files:
File build.gradle
plugins {
id 'com.commercehub.gradle.plugin.avro' version '0.9.0'
}
apply plugin: 'application'
group 'ahmed'
version '1.0-SNAPSHOT'
sourceCompatibility = 1.8
dependencies {
compile 'org.apa...