Pushing down filter predicate in Spark JDBC Properties

Refresh

April 2019

Views

612 time

1

How can I setup my spark jdbc options to make sure I push down a filter predicate to the database and not load everything first? I'm using spark 2.1. Can't get the right syntax to use and I know I can add a where clause after the load() but that would obviously load everything first. I'm trying the below but whereas this filter would take a couple of seconds when running in my db client it doesn't return anything and just keeps running when trying to push down the predicate from spark jdbc.

val jdbcReadOpts = Map(
  "url" -> url,
  "driver" -> driver,
  "user" -> user,
  "password" -> pass,
  "dbtable" -> tblQuery,
  "inferSchema" -> "true")

val predicate = "DATE(TS_COLUMN) = '2018-01-01'"
// Also tried -> val predicate = "SIMPLEDATECOL = '2018-01-01'"

val df = spark.read.format("jdbc")
  .options(jdbcReadOpts)
  .load().where(predicate)

2 answers

0

As per ParquetFilters.scala source code, only below types are allowed to be pushed down as predicate filter.

BooleanType, IntegerType, LongType, FloatType, DoubleType, BinaryType

I found the these links useful to understand this functionality.

0

and I know I can add a where clause after the load() but that would obviously load everything first.

This is not true. Predicates used in where and that can be pushed down, (not every predicate can be, with predicates based on functions as the most obvious example), are automatically pushed. Beware that caching might affect predicate pushdown in some cases.

To push down limits, aggregations, and non-trivial predicates you can utilize query strings: Does spark predicate pushdown work with JDBC?.