Questions tagged [azure-databricks]

0

votes
0

answer
3

Views

Using Databricks connect through proxy

I would like to use databricks-connect through a proxy requiring authentication. I am using a Linux O.S. and Azure Databricks. I have configured databricks-connect within my home network. Databricks-connect test correctly works. Instead, in my office I need to set up a proxy. This proxy uses a basic...
LizardKing
1

votes
0

answer
19

Views

Mapping headers into PySpark sql Dataframe

I am working on Azure Databricks, and my scenario is the following: I'm reading (using: spark.read.format('csv').options().load()) a CSV file stored in Blob storage. Such file contains 1000 columns/variables(one thousand) but data and header are separated (different files). I want to map headers int...
FelipePerezR
1

votes
1

answer
84

Views

Can pure python script (not pyspark) run in parallel in a cluster in Azure Databricks?

I want to migrate my python scripts from local to run on cloud, specifically on a cluster created on Azure Databricks. Can pure python script run in parallel (using multiple nodes in a cluster at the same time) without having to be converted into pyspark? Is it possible to check whether the job is r...
S.J.
1

votes
1

answer
218

Views

Databricks read Azure blob last modified date

I have an Azure blob storage mounted to my Databricks hdfs. Is there a way to get the last modified date of the blob in databricks? This is how i'm reading the blob content: val df = spark.read .option('header', 'false') .option('inferSchema', 'false') .option('delimiter', ',') .csv('/mnt/test/*')
ilanak
1

votes
0

answer
164

Views

Azure Databricks Notebook unable to find “dbutils” when it is in package

I am creating a class in for communicating with azure storage blobs and it is working fine but if I try to put this class in package it is giving me an error 'error: not found: value dbutils'. It is working fine if I remove the 'package Libraries.Custom' above my class. I am creating a class in azur...
Omair Zia
1

votes
1

answer
89

Views

Attaching a library to Azure databricks cluster

I want to make use of ts-flint on Azure Datatbricks. I believe the process is documented here: https://docs.azuredatabricks.net/user-guide/libraries.html I tried to create a library from the Azure portal and attach it to my testCluster, but using the instructions provided but I can't seem to see it...
user1761806
1

votes
0

answer
143

Views

Unable to connect to Azure Cosmos Db using mongo db api

I am trying to connect to azure cosmos db using mongo db api (spark mongo db connector )to export data to hdfs but I get the below exception: Below is the complete stacktrace: { '_t' : 'OKMongoResponse', 'ok' : 0, 'code' : 115, 'errmsg' : 'Command is not supported', '$err' : 'Command is not supporte...
shatabdi Mukherjee
1

votes
2

answer
107

Views

Generate Azure Databricks Token using Powershell script

I need to generate Azure Databricks token using Powershell script. I am done with creation of Azure Databricks using ARM template , now i am looking to generate Databricks token using powershell script . Kindly let me know how to create Databricks token using Powershell script
kartik iyer
1

votes
0

answer
31

Views

Writes to cosmosdb never from pyspark runs forever / never succeeds nor fails

I'm using pyspark / databricks to E.T.L data from parquet files to CosmosDB (documentdb api). Despite a first successful test with 5 rows of data, every following try to write data in cosmosdb just goes nowhere. Even with only one row, it just runs forever. When monitoring cosmosdb there is a regula...
Vincent Chalmel
1

votes
1

answer
87

Views

Get exceptions from Databricks Notebook in Azure Datafactory pipeline

I've added a Databricks Notebook to a Datafactory pipeline. If the Python script inside the notebook throws an exception, this exception will not be mentioned by the pipeline. I know there is a runPageUrl where I can see the results. But I want the pipeline to know if an error occurred in the Python...
Harm Cox
1

votes
0

answer
7

Views

Can we work with external APIs in Azure Databricks?

Being newbie to Databricks just exploring ways to access third party APIs in Databricks. Example : One of the sceanario is checking whether json file which is being processed via Databricks whether its in correct Json format or not? We have one API which validate this format, question is can we co...
Abhi
1

votes
1

answer
103

Views

Efficient way of reading parquet files between a date range in Azure Databricks

I would like to know if below pseudo code is efficient method to read multiple parquet files between a date range stored in Azure Data Lake from PySpark(Azure Databricks). Note: the parquet files are not partitioned by date. Im using uat/EntityName/2019/01/01/EntityName_2019_01_01_HHMMSS.parquet con...
samratb
1

votes
1

answer
99

Views

Issue connecting to Databricks table from Azure Data Factory using the Spark odbc connector

​We have managed to get a valid connection from Azure Data Factory towards our Azure Databricks cluster using the Spark (odbc) connector. In the list of tables we do get the expected list, but when querying a specific table we get an exception. ERROR [HY000] [Microsoft][Hardy] (35) Error from serv...
BTV
1

votes
1

answer
39

Views

Azure Databricks Spark XML Library - Trying to read xml files

I am trying to create a databricks notebook to read a xml file from Azure Data Lake and convert to parquet. I got the spark-xml library from here - [https://github.com/databricks/spark-xml]. I followed the example provided in the github but not able to get it working. df = (spark.read.format('xml')...
Satya Azure
1

votes
0

answer
23

Views

Databricks JDBC Integrated Security

Help :) I need to connect from my Azure databricks cluster to a SQL Azure instance using my Azure AD credentials. I have tested and I can connect to the target database using SSMS (SQL Server Management Studio) through my Azure AD credentials so that works fine. Firewall connectivity is fine. I hav...
Murray Foxcroft
1

votes
1

answer
26

Views

How to call notebook or run jobs from C# in databricks using Mobius?

I'm new to Databricks.Is it possible to send the code passing through API (like Mobius) from C# to run jobs in Databricks ? Could you possibly give me some code example ? such as if I want to run some job in notebook which contain the NoSql code in there. Thank you.
Sattawat Boonchoo
1

votes
0

answer
53

Views

Hive managed table drop doesn't delete files on HDFS. Any solutions?

While deleting managed tables from the hive, its associated files from hdfs are not being removed (on azure-databricks). I am getting the following error: [Simba]SparkJDBCDriver ERROR processing query/statement. Error Code: 0, SQL state: org.apache.spark.sql.AnalysisException: Can not create the man...
JITENDRA
1

votes
1

answer
79

Views

Execute python script from azure data factory

Can someone help me with executing python function from azure data factory. I have stored python function in blob and i'm trying to trigger the same. However i'm not able to do it. Please assist. Second, Can i parameterize python function call from ADF?
Nomad18
1

votes
0

answer
79

Views

How to read csv file for which data contains double quotes and comma seperated using spark dataframe in databricks

I'm trying to read csv file using spark dataframe in databricks. The csv file contains double quoted with comma separated columns. I tried with the below code and not able to read the csv file. But if I check the file in datalake I can see the file. The input and output is as follows df = spark.rea...
pythonUser
1

votes
0

answer
15

Views

How to execute Intellij Spark Code on Databricks Cluster

I'm trying to launch my Spark code that i've written in Intellij and run it on Databricks, so I've found that it can be done by 'sbt-databricks' plugin. Here is my build.sbt file : name := 'DatabricksTest' version := '1.0' scalaVersion := '2.11.8' libraryDependencies ++= Seq('org.apache.spark' %% '...
I.Chorfi
1

votes
1

answer
34

Views

Connect to Azure SQL Database from DataBricks using Service Principal

I have a requirement to connect to Azure SQL Database from Azure Databricks via Service Principal. Tried searching forums but unable to find the right approach. Any help is greatly appreciated. Tried a similar approach with SQL User ID and Password with JDBC Connection and it worked successfully. No...
user1483122
1

votes
1

answer
36

Views

Pushing logs to Log Analytics from Databricks

I have logs collected in Databricks cluster but I need to pushed to Log Analytics in Azure to have a common log collection Have not tried anything but would like to know what the approach
Teik Hooi Beh
1

votes
1

answer
41

Views

(SPARK) What is the best way to partition data on which multiple filters are applied?

I am working in Spark (on azure databricks) with a 15 billion rows file that looks like this : +---------+---------------+----------------+-------------+--------+------+ |client_id|transaction_key|transaction_date| product_id|store_id|spend| +---------+---------------+----------------+------------...
RobL
1

votes
1

answer
25

Views

How to run SQL statement from Databricks cluster

I have an Azure Databricks cluster that processes various tables and then as a final step I push these table into an Azure SQL Server to be used by some other processes. I have a cell in databricks that looks something like this: def generate_connection(): jdbcUsername = dbutils.secrets.get(scope =...
Paul Cavacas
1

votes
0

answer
35

Views

Pyspark: Dropping columns with no distinct values only using transformations

I have a huge dataframe with 1340 columns. Before diving into modelisation, I must get rid of columns with no distinct values. The few ways I found to do it require Actions on the dataframe i.e. it takes much time (approx: 75 hours). How to solve this only using Transformations in order to save a lo...
LePuppy
1

votes
0

answer
16

Views

Unable to write dataframe to azure cosmo db

I am unable to write data to ****cosmos db**** using databricks spark cluster. However, I tried all the links and solutions in Stackoverflow and Github and tried all the possible jars with about every version. The error stack is: java.lang.NoSuchMethodError: com.microsoft.azure.documentdb.Offer.getC...
Muhammad Fayyaz
1

votes
1

answer
22

Views

folium map not showing datbricks python

I am working on Databricks and have a folium map: import geopandas as gpd import matplotlib as plt import os import folium from IPython.display import display map_osm = folium.Map(location=[45.5236, -122.6750]) map_osm I get the following: I tried Folium map not displaying to no avail. Any suggestio...
alex
0

votes
0

answer
3

Views

How to import one databricks notebook into another?

I have a python notebook A in Azure Databricks having import statement as below: import xyz, datetime, ... I have another notebook xyz being imported in notebook A as shown in above code. When I run notebook A, it throws the following error: ImportError: No module named xyz Both notebooks are in...
user39602
0

votes
1

answer
65

Views

Azure Databricks - Can not create the managed table The associated location already exists

I have the following problem in Azure Databricks. Sometimes when I try to save a DataFrame as a managed table: SomeData_df.write.mode('overwrite').saveAsTable('SomeData') I get the following error: 'Can not create the managed table('SomeData'). The associated location('dbfs:/user/hive/warehouse/some...
BuahahaXD
1

votes
2

answer
0

Views

Data Explorer: ImportError No module named Kqlmagic

I'm following this tutorial: https://docs.microsoft.com/en-us/azure/data-explorer/kqlmagic I have a Databricks cluster so I decided to use the notebook that is available on there. When I get to step 2 and run: reload_ext Kqlmagic I get the error message: ImportError: No module named Kqlmagic
user1761806
1

votes
1

answer
0

Views

Databricks file copy with dbtuils only if file doesn't exist

I'm using the following databricks utilites (dbutils) command to copy files from one location to another as shown below: dbutils.fs.cp('adl://dblake.azuredatalakestore.net/jfolder2/thisfile.csv','adl://cadblake.azuredatalakestore.net/landing/') However, I want the file to be copied over only if no s...
Carltonp
1

votes
1

answer
0

Views

How to TRUNCATE and / or use wildcards with Databrick

I'm trying to write a script in databricks that will select a file based on certain characters in the name of the file or just on the datestamp in the file. For example, the following file looks as follows: LCMS_MRD_Delta_LoyaltyAccount_1992_2018-12-22 06-07-31 I have created the following code in...
Carltonp
1

votes
1

answer
0

Views

Create Azure Databricks Token using ARM template

I need to create a token in Azure Databricks using ARM template. I am able to create Azure Databricks using ARM template but unable to create token in Azure Databricks using ARM template Following is the template which i have used to create Azure Databricks { '$schema': 'https://schema.management.az...
kartik iyer
1

votes
1

answer
0

Views

How to update a Azure SQL Database/Data Warehouse table by azure Databricks?

I have a requirement in my project where i am implementing SCD type 2 table in Azure SQL DW. I am able to insert new records using JDBC connector but i need to update old records as 'expired' and update other records as per updated values.
shubham nayak
1

votes
2

answer
0

Views

R Version on Azure Databricks

Azure Databricks currently runs R version 3.4.4 (2018-03-15), which is unacceptable in my opinion since the latest R version on CRAN is 3.5.2 (2018-12-20). My question is: Is it possible for me to upgrade and install R version 3.5.2 on Azure Databricks? Follow-up question, is there any information...
jsb
1

votes
3

answer
0

Views

Login to Azure ML workspace from Azure Databricks notebook

I am writing a python notebook in Azure Databricks cluster to perform an Azure Machine learning experiment. I have created an Azure ML workspace and instantiating a workspace object in my notebook as follows: id = InteractiveLoginAuthentication(force=False, tenant_id=AzureTenantId) ws = Workspace(Su...
Charanjit
1

votes
1

answer
0

Views

How to access on premise Teradata from Azure Databricks

We need to connect to on premise Teradata from Azure Databricks . Is that possible at all ? If yes please let me know how .
Ayan
1

votes
1

answer
0

Views

Databricks Spark CREATE TABLE takes forever for 1 million small XML files

I have a set of 1 million XML files, each of size ~14KB in Azure Blob Storage, mounted in Azure Databricks, and I am trying to use CREATE TABLE, with the expectation of one record for each file. The Experiment The content structure of the files is depicted below. For simplicity and performance exper...
Abhra Basak
1

votes
1

answer
0

Views

Databricks Notebook with %run - Not working

I have referenced the topic - How to pass a variable to magic ´run´ function in IPython for triggering a Notebook from another Notebook. notebook = '/Users/xxx/TestFolder/Notebook1' In the next cell, I am trying to call %run like this as per the solution suggested in the above article: %run $note...
Satya Azure
1

votes
1

answer
0

Views

best way to install a library on azure databricks

I need to install azure library on my azure databricks. Right now I am installing it globally but sometime when cluster starts my notebook fails with error as below: AttributeError: module 'lib' has no attribute 'SSL_ST_INIT' what should be the best way to install a library on azure databricks, inst...
shubham nayak

View additional questions