Questions tagged [avro]

1

votes
1

answer
262

Views

Sqoop: Avro with Gzip Codec failing

When trying to import tables to HDFS using Sqoop with --as-avrodatafile and GzipCodec, it is failing with below exception, I'm running this CDH7 Cloudera quickstart docker image Is there a reason we cannot use Gzip with Avro or is it some missing configuration that is causing this. Note: Gzip works...
1

votes
3

answer
4k

Views

How to create an empty dataFrame in Spark

I have a set of Avro based hive tables and I need to read data from them. As Spark-SQL uses hive serdes to read the data from HDFS, it is much slower than reading HDFS directly. So I have used data bricks Spark-Avro jar to read the Avro files from underlying HDFS dir. Everything works fine except w...
Vinay Kumar
1

votes
2

answer
722

Views

How to convert Avro GenericRecord to a valid Json using while coverting timestamp fields from milliseconds to datetime?

How to convert Avro GenericRecord to Json using while coverting timestamp fields from milliseconds to datetime? Currently using Avro 1.8.2 Timestamp tsp = new Timestamp(1530228588182l); Schema schema = SchemaBuilder.builder() .record('hello') .fields() .name('tsp').type(LogicalTypes.timestampMilli...
user1870400
1

votes
3

answer
821

Views

How can do Functional tests for Kafka Streams with Avro (schemaRegistry)?

A brief explanation of what I want to achieve: I want to do functional tests for a kafka stream topology (using TopologyTestDriver) for avro records. Issues: Can't 'mock' schemaRegistry to automate the schema publishing/reading What I tried so far is use MockSchemaRegistryClient to try to mock the...
Ramon jansen gomez
1

votes
1

answer
91

Views

Loading Avro Data into BigQuery via command-line?

I have created an avro-hive table and loaded data into avro-table from another table using hive insert-overwrite command.I can see the data in avro-hive table but when i try to load this into bigQuery table, It gives an error. Table schema:- CREATE TABLE `adityadb1.gold_hcth_prfl_datatype_accepten...
Vishwanath Sharma
1

votes
0

answer
90

Views

Standard serialization protocol to serialize set of objects to disk

I am looking for standard protocol that provides ability to serialize set of object (same type) to a file, but also provide easy way to align to object boundary if reader/de-serializer start reading from random byte offset. After googling I found out that Apache Avro provides this functionality usi...
user2774767
1

votes
0

answer
8

Views

CreateDirectStream with messages avro

In a first moment, I had to process the information from a text file: C1_4,C2_4,C1______10,01/12/2015,30/12/2015,123456789,S,12345 Now, I need to process the same information but in format avro. How can I do it ? Before I used this code: createDirectStream[String, String, StringDecoder, StringDecode...
user2140391
1

votes
0

answer
198

Views

“Value” causing issues in schema generation

I have an object like this: 'Meta': { 'Type': 10, 'Key': 'Meta', 'Value': {
Matt
1

votes
0

answer
368

Views

Cannot deserialize data using apache avro

I have a spring boot application that sends and receives data from a kafka broker, i'm using apache avro as a SerDer. What I've done so far is generate the class using maven plugin, the schema is fairly simple: {'namespace': 'com.domain', 'type': 'record', 'name': 'User', 'fields': [ {'name': 'name...
Ouerghi Yassine
1

votes
1

answer
219

Views

Avro schema record field name start from number

Avro documentation says: The name portion of a fullname, record field names, and enum symbols must: start with [A-Za-z_] subsequently contain only [A-Za-z0-9_] Is it possible somehow to escape the first rule and have record field name starting with digit, i.e. 123ColumnName? Maybe via 'escaping' or...
Andrey Dmitriev
1

votes
2

answer
714

Views

avro json additional field

I have following avro schema { 'type':'record', 'name':'test', 'namespace':'test.name', 'fields':[ {'name':'items','type': {'type':'array', 'items': {'type':'record','name':'items', 'fields':[ {'name':'name','type':'string'}, {'name':'state','type':'string'} ] } } }, {'name':'firstname','type':'stri...
ASe
1

votes
0

answer
152

Views

GenericRecord to GenericRecord using sub schema (reader schema) without Encode/Decode

I am trying an efficient way to create a GenericRecord from another GenericRecord using a subschema (reader schema) without using Encoder/Decoder. For example I have the following schemas String fullSchema = '{\'namespace\': \'example.avro\',\n' + ' \'type\': \'record\',\n' + ' \'name\': \'User\',\...
hlagos
1

votes
0

answer
153

Views

how to POST Avro bytes to flask endpoint

problem: when a POST request is sent to a flask endpoint where a field consists of raw bytes (Apache Avro format), flask automatically attempts to decode the bytes into unicode which messes up the data. for example, when a POST request is sent via python test client as follows: # part of a python un...
732b
1

votes
1

answer
631

Views

How to resolve The datum is not an example of schema (Avro::IO::AvroTypeError)

I am a newbie to Avro with Ruby and basically to programming. While i was performing some basic stuff on Avro with ruby, I see some issues with the schema. Below is the code. require 'rubygems' require 'avro' require 'mysql2' require 'json' require 'multi_json' # setup mysql db = Mysql2::Client....
user265629
1

votes
0

answer
107

Views

how to import avro file in ADLS in Azure Analysis Service model?

After the support of ADLS as back end in Azure Analysis service , we want to import avro file from ADLS into Azure Analysis service . The file size will be in range of TBs ( 2-4 ) . Azure online documentation tells about csv only and not of avro . Is it possible to import avro from ADLS ( azure da...
Kaa
1

votes
1

answer
501

Views

Read Avro file with python to create a SQL table

I'm trying to create an SQL table from AVRO file which contains the structure of my table : { 'type' : 'record', 'name' : 'warranty', 'doc' : 'Schema generated by Kite', 'fields' : [ { 'name' : 'id', 'type' : 'long', 'doc' : 'Type inferred from '1'' }, { 'name' : 'train_id', 'type' : 'long', 'doc' :...
amira khalifa
1

votes
1

answer
460

Views

How to get the avro schema from StructType

I have a dataFrame Dataset dataset = getSparkInstance().createDataFrame(newRDD, struct); dataset.schema() is returning me a StructType. But I want the actual schema to store in sample.avsc file Basically I want to convert StructType to Avro Schema file (.avsc). any Idea?
Sumit G
1

votes
0

answer
448

Views

How to use Schema Registry and AvroConverter in Kafka Source connector?

I'm trying to write custom Source and Sink Kafka connectors for MongoDB with schema registry and Avro. I have configured the key and value converters as AvroConverter and the schema registry url in the properties. What code do i have to add in my connector so that it converts the data to Avro and v...
Chitchat16
1

votes
0

answer
49

Views

How does Schema Registry integrate with Kafka Source Connector?

I have added Topic-Key and Topic-Value schemas for a given topic using REST APIs. In my custom connector, do I need to create a schema again using SchemaBuilder? How do I access the registered schemas inside my connector ?
Chitchat16
1

votes
0

answer
226

Views

Using Spark fileStream with Avro Data Input

I'm trying to create a Spark Streaming application using fileStream(). The document guide specified: streamingContext.fileStream[KeyClass, ValueClass, InputFormatClass](dataDirectory) I need to pass KeyClass, ValueClass, InputFormatClass. My main question is what can I use for these parameters for...
Nk.Pl
1

votes
0

answer
74

Views

Schema design in avro and protobufs

Currently a lot of data that we store are in the form of avro records , or serialized protobuf bytes. I want to see how can i design an efficient schema for my data to improve reading/parsing serializing speed by data pipelines. For example consider the following case Schema A : Column 1 : columnNa...
user179156
1

votes
0

answer
70

Views

How to store data on hdfs using flume with existing schema file

I have json data coming from source and i want to dump it on hdfs using flume in avro format for which i already have avsc file, i am using following configuration for sink but thats not picking my avsc file but creating its own schema : agent1.sinks.sink1.type = hdfs agent1.sinks.sink1.serializer...
User_qwerty
1

votes
0

answer
80

Views

Avro: detect union field types in python

Given the following schema that contains a union of three types: { 'namespace':'com.example', 'type':'record', 'name':'TestObject', 'fields':[ { 'name':'element', 'type': [ 'null', 'string', { 'name':'element_type', 'type': 'enum', 'symbols': ['TYPE1', 'TYPE2'] } ], 'default':'TYPE1' } ] } Is it pos...
Jonny5
1

votes
0

answer
92

Views

Apache Beam DynamicAvroDestinations DefaultFilenamePolicy with String instead of ResourceId

According to the write example on https://beam.apache.org/documentation/sdks/javadoc/2.4.0/org/apache/beam/sdk/io/AvroIO.html The following code should work: public FilenamePolicy getFilenamePolicy(Integer userId) { return DefaultFilenamePolicy.fromParams(new Params().withBaseFilename(baseDir + '/us...
bjorndv
1

votes
0

answer
227

Views

.NET Core 2.0 Avro CodeGenerator cannot handle nested structures or IDL

I am trying to use the Microsoft.Hadoop.Avro.Utils library in a C# .NET Core 2.0 application to generate C# class for my Avro schema, but I cannot get it to handle nested records when creating a simple Avro JSON schema. In general, I would much more like to use the Avro IDL format to describe my dat...
aweis
1

votes
0

answer
76

Views

Avro equivalent of Protocol Buffers 'oneof' type

Wanted to know if there is any good Avro type or definition (or combination of) to simulate protobuf's 'oneof' type. I didn't see an easy way to make a field optional, nor to restrict it to one selection of many. If there's no good equivalent, how would you suggest trying to implement this with Avro...
user8897013
1

votes
2

answer
2.3k

Views

Unable to generate avro generic record from object

I am trying to send avro records to A Kafka topic using Kafka producer. I have a User class and I am sending object of that class. Below code works fine if i use avroRecord.put(); to set each attribute. But what i want is to create A Generic Record from an object without using avroRecord.put(); for...
blasteralfred Ψ
1

votes
0

answer
188

Views

Spark kafka avro producer in structured streaming

I have a working UDF that does a side effect, send a kafka message in avro, which I know is not the purpose of a UDF. I could not find a good way to accomplish this, and this works... but I'm wondering if this is a really bad idea. Does someone have a better way of doing this? #if you don't have a s...
Brian
1

votes
0

answer
205

Views

Avro Schema vs. Scala Case Class for Spark Datasets

I am curious if there is any significant performance difference between using schemas defined in Scala case classes versus defining schemas with Apache Avro for Spark Datasets. Currently I have a schema that looks something like this: root |-- uniqueID: string (nullable = true) |-- fieldCount: integ...
Nate Parke
1

votes
0

answer
231

Views

Error reading Avro file in Python

I'm trying to read an avro file into Python, and the following code works on OSX and linux box but breaks on a Windows: from avro.datafile import DataFileReader, DataFileWriter from avro.io import DatumReader, DatumWriter reader = DataFileReader(open('my_file.avro', 'rb'), DatumReader()) for line in...
mgoldwasser
1

votes
0

answer
154

Views

NiFi avro schema using regex to validate a string

I have an avro schema in NiFi which validates the columns of a CSV file, all is working well, however I'd like to ideally have an extra level of validation on certain string column to test that they adhere to specific patterns. For example ABC1234-X, or whatever. Here's the wrinkle though, the avro...
Mark Balmer
1

votes
0

answer
229

Views

Kafka jdbc sink connector does not correct convert timestamp

i have a schema { 'type' : 'record', 'name' : 'test', 'namespace' : 'test', 'fields' : [ { 'name' : 'time', 'type': [ 'null', { 'type': 'long', 'logicalType': 'timestamp-millis' }] }, .... { 'name' : 'time2', 'type': ['null', { 'type' : 'long', 'logicaltype': 'timestamp-millis' }] } } But when kaf...
Mikhail
1

votes
1

answer
246

Views

How to write the ValueJoiner when joining two Kafka Streams defined using Avro Schemas?

I am building an ecommerce application, where I am currently dealing with two data feeds: order executions, and broken sales. A broken sale would be an invalid execution, for a variety of reasons. A broken sale would have the same order ref number as the order, so the join is on order ref # and line...
1

votes
0

answer
81

Views

Spark performance is not enhancing

I am using Zeppelin to read avro files of size in GBs and have records in billions. I have tried with 2 instances and 7 instances on AWS EMR, but the performance seems equal. With 7 instances it is still taking alot of times. The code is: val snowball = spark.read.avro(snowBallUrl + folder + prefix...
Waqar Ahmed
1

votes
0

answer
37

Views

Schema validation of multi reference chainied schema

I want to do three things Validate JSON against a JSON-Schema Create JSON-Schema to AVRO Schema converter Create JSON-Schema to Hive Table converter The problem I'm facing is the Schema has a referencing chain. I'm trying to use this JSON Schema Validator which resolves reference and validates but...
Sam
1

votes
2

answer
60

Views

How is schema from Schema-Registry is propagated over Replicator

How do schemas from Confluent Schema-Registry get propagated by Confluent-Replicator to destination Kafka-Cluster and Schema-Registry? Is each replicated message schema contained in it or are schemas replicated somehow separately through a separate topic? I didn't see any configuration possibilities...
Neven
1

votes
0

answer
80

Views

Can I use Apache Avro just for JSON documents schema validation?

I know that Avro is very fast data serialisation and deserialisation system. It also provides rich data structure for schema defination. Is it possible to use Avro just for JSON documents schema validation? For example, I have thousands of JSON documents and I want to validate those JSON documents a...
Deepak Bhatia
1

votes
0

answer
46

Views

Loading of Apache Avro plugin for Tranquility fails with Exception

For the Kafka Avro producer I run : ./kafka-avro-console-producer --broker-list localhost:9092 --topic pageviews --property value.schema='{'type':'record','name':'mypageviews','fields':[{'name':'time','type':'string'},{'name':'url','type':'string'},{'name':'user','type':'string'},{'name':'latencyMs'...
Saeed Mohtasham
1

votes
1

answer
924

Views

Spark read avro

Trying to read an avro file. val df = spark.read.avro(file) Running into Avro schema cannot be converted to a Spark SQL StructType: [ 'null', 'string' ] Tried to manually create a schema, but now running into the following: val s = StructType(List(StructField('value', StringType, nullable = true)))...
timvw
1

votes
1

answer
282

Views

Build error using Apache Avro generated files

I a using Gradle to build a simple app using Apache Avro. Following are the relevant files: File build.gradle plugins { id 'com.commercehub.gradle.plugin.avro' version '0.9.0' } apply plugin: 'application' group 'ahmed' version '1.0-SNAPSHOT' sourceCompatibility = 1.8 dependencies { compile 'org.apa...
Ahmed A

View additional questions