Questions tagged [google-bigquery]
4392 questions
1
votes
1
answer
167
Views
Biq Query regex_replace error (\? vs \\?)
I am having issues understanding what's wrong with this regex: \?.*
select REGEXP_REPLACE(longstringcolumn, '\?.*', '') as newstring from tablename
My example string aka 'longstring' has '?' character, and I am trying to match everything trailing '?' (including '?' itself).
I have checked my regexp...
1
votes
2
answer
142
Views
How to query a Google BigQuery table and remove duplicates based on a subset of columns?
I have a query that joins two google tables and produces a table with 6 columns (a, b, c, d, e, f). Next, I move that table to a google bucket and then download that google bucket to a bunch of CSV's. Finally I insert those CVS's into a postgres database table which has 2 primary keys, a and b.
The...
1
votes
2
answer
1.4k
Views
BigQuery: Syntax error: Unexpected keyword LEFT
I got this error of 'Syntax error: Unexpected keyword LEFT' from the following SQL (standard SQL) in BigQuery:
select left(cast(ts as string), 16) from temp.loc limit 1;
'ts' is a timestamp field and I wanted to get upto minutes of timestamp. Any idea?
1
votes
2
answer
107
Views
Bigquery Authorized View Cost Billing Account
When there are two projects under two different billing accounts, and there is authorized view across the two projects, which billing account will be billed for the query cost on the views?
Scenario:
Project A contains the views use Project B's dataset which contains the actual data. When analysts r...
1
votes
1
answer
450
Views
BigQuery: Questions on Delete and Update rows using nodejs
I've found a lot of node.js examples to query and insert data into BigQuery but didn't find any example nor API description on how to delete and update rows in the database. I am aware of the limitations (30 minutes since last change, etc.).
The only tip I've found I got from vscode
bigQuery.dataset...
1
votes
2
answer
59
Views
How to pass query statement to bigquery in node.js environment
During the big query, the parameters of the function in the SQL statement
I want to update the result of a sql statement by inserting it as @ variable name.
However, there is no method to support node.js.
For python, there are methods like the following example.
You can use the function's parameters...
1
votes
2
answer
61
Views
Is there a quicker way to initialise a BigQuery client?
Using the recommended way to initialise a BigQuery client from the google documentation at Quickstart: Using Client Libraries takes 15 seconds to complete. This seems very slow - is there a quicker way?
import com.google.cloud.bigquery.BigQuery;
import com.google.cloud.bigquery.BigQueryOptions;
publ...
1
votes
3
answer
59
Views
Wildcard table matches with _TABLE_SUFFIX and sub-query
The _TABLE_SUFFIX feature is great and exactly what I was looking for to solve my problem - however it is scanning all of the data matched by the wildcard when I use a sub-query to determine which tables to match on.
If you do an operation such as = or BETWEEN or IN with a set of values on _TABLE_SU...
1
votes
3
answer
66
Views
Changing query to avoid “Aggregations of aggregations are not allowed” in Bigquery
Given user and order tables, I need to count users who made their first order on the next day after registration date.
I managed to list such users with the following query:
SELECT
users.first_name as first_name,
users.last_name as last_name,
users.registration_date as registration_date,
min(orders...
1
votes
1
answer
50
Views
Calculate distance on a polyline of a road between 2 lat/lons
This is not distance as the crow flies.
I'm looking for an API like this:
distanceMiles = calculateMilesBetweenPointsAlongRoad(LatLon1, LatLon2, RoadPolyline)
I have a road represented as a polyline.
As a vehicle moves on this road, I capture lat/lons. I want to calculate the distance the vehicle tr...
1
votes
1
answer
74
Views
How to change the col type of a BigQuery repeated record
I'm trying to change a col type of a repeated record from STRING to TIMESTAMP. There are a few suggestions from BQ docs here (manually-changing-schemas). However, I'm running into issues with each of the recommended suggestions.
Here is an example schema:
{
'name' => 'id',
'type' => 'STRING',
'mode'...
1
votes
1
answer
29
Views
Find all rows with Null value(s) in a specific column(s) in Big Query
Is there a way to improve the following? I need to count all rows with NULL value(s) in a specific column.
SELECT
SUM(IF(column1 IS NULL, 1, 0)) AS column1,
SUM(IF(column2 IS NULL, 1, 0)) AS column2
FROM
`dataset.table`;
0
votes
2
answer
24
Views
SQL multiple AS columns from WHERE
I have a table
name | age | city
-------------
joe | 42 | berlin
ben | 42 | munich
anna | 22 | hamburg
pia | 50 | berlin
georg | 42 | munich
lisa | 42 | berlin
Now I would like to get all 42 year old in different columns by city
berlin | munich
-------------
joe | ben
lisa | georg
So I would need so...
1
votes
2
answer
1.1k
Views
BigQuery DeDuplication on two columns as unique key
We use BigQuery religiously and have two tables that essentially were updated in parallel by different process. The problem I have we don't have a unique identifier for tables and the goal is to combine the two tables with zero duplication if possible.. The unique identifier is two columns combined....
0
votes
1
answer
143
Views
Possibility of updating data in real-time on a client
I have the following scenario that I was wondering if it's possible/feasible to implement. I apologize if this is considered an overly 'broad' question, but I think SO would be the best place to ask this.
Let us suppose I have a website and I want to display a graph to an end-user. For the purposes...
1
votes
2
answer
611
Views
Uploading JSON to Bigquery unspecific error
I am just getting started with the python BigQuery API (https://github.com/GoogleCloudPlatform/google-cloud-python/tree/master/bigquery) after briefly trying out (https://github.com/pydata/pandas-gbq) and realizing that the pandas-gbq does not support RECORD type, i.e. no nested fields.
Now I am try...
1
votes
2
answer
47
Views
How can I extract table defintion from BigQuery
I want to duplicate specific table schema without the data.
Basically create a clean table with different name.
Say original table orders as:
a integer
b string
c float
I want to create: orders-copy as:
a integer
b string
c float
BigQuery offers the COPY option from the UI but this also copy the d...
1
votes
2
answer
251
Views
How to get max value of column values in a record ? (BigQuery)
I wanna get max value of each rows, not max value of a field.
For example, when I have a sample_table like this:
sample_table
|col1|col2|col3|
|--------------|
| 1 | 0 | 0 |
| 0 | 2 | 0 |
| 2 | 0 | 0 |
| 0 | 0 | 3 |
And the query and result I want is something like this:
query
SELECT SOM...
1
votes
1
answer
63
Views
Cosine similarity between pair of arrays in Bigquery
I have created a table that has a pair of IDs and coordinate fro each of them so that I can calculate pairwise cosine similarity between them.
The table looks like this
The number of dimension for the coords are currently 128, but it can vary. But the number dimensions for a pair of ID are always sa...
1
votes
3
answer
162
Views
Get a massive csv file from GCS to BQ
I have a very large CSV file (let's say 1TB) that I need to get from GCS onto BQ. While BQ does have a CSV-loader, the CSV files that I have are pretty non-standard and don't end up loading properly to BQ without formatting it.
Normally I would download the csv file onto a server to 'process it' and...
1
votes
2
answer
46
Views
Create an array with NULL values/0 and find array length excluding null/0
I want to find the number of columns in a range in each row which has non-null and >0 value.
I have done this currently using case when statements or IF-ELSE. But the number of columns that i have to now consider has increased and with that the number of case statements too.
So i wanted to create an...
1
votes
1
answer
219
Views
List all the tables in a dataset in bigquery using bq CLI and store them to google cloud storage
I have around 108 tables in a dataset. I am trying to extract all those tables using the following bash script:
# get list of tables
tables=$(bq ls '$project:$dataset' | awk '{print $1}' | tail +3)
# extract into storage
for table in $tables
do
bq extract --destination_format 'NEWLINE_DELIMITED_JSON...
1
votes
2
answer
44
Views
Counting the occurrence of a substring from a delimited field
I have some data that looks like:
Sequence, length
abc, 1
bat, 1
abc > abc, 2
abc > bat, 2
ced > ced > ced > fan, 4
I'm trying to see the frequency of various strings as a new column to this data. For example:
Sequence, length, count_of_ced
abc, 1, 0
bat, 1, 0
abc > abc, 2, 0
abc > bat, 2, 0
ced > c...
1
votes
2
answer
57
Views
Bigquery: Is there a way to round a timestamps UP or DOWN to the NEAREST minute?
I've been trying to round UP and DOWN to the NEAREST minute in Bigquery. Does anyone know the best function and method to achieve this?
user_id | created_at
-------------------------------------
14451 | 2019-01-31 04:51:28 UTC
14452 | 2019-01-31 04:51:31 UTC
14453 | 2019-01-31...
1
votes
3
answer
45
Views
How can you figure out if Column A contains something from Column B?
I've been trying to figure out a way to grab information from Table A Column A compared to Table B Column A, for example:
TableA
Name
abcd_1234_efgh
zxcdde_gets_3214_
jkil_uelso_5555_aseil
uuuu_kkkk_iiii_3333
TableB
ID
1234
3214
5555
3333
I've tried doing an INNER JOIN...
1
votes
2
answer
75
Views
Get number of rows in a BigQuery table (streaming buffer)
I am doing inserts via Streaming. In the UI, I can see the following row counts:
Is there a way to get that via the API? Current when I do:
from google.cloud import bigquery
client = bigquery.Client()
dataset = client.dataset('bqtesting')
table = client.get_table(dataset.table('table_streaming'))
ta...
1
votes
2
answer
85
Views
Flatten nested JSON string to different columns in Google BigQuery
I have column in one of the BigQuery table which looks like this.
{'name': 'name1', 'last_delivered': {'push_id': 'push_id1', 'time': 'time1'}, 'session_id': 'session_id1', 'source': 'SDK', 'properties': {'UserId': 'u1'}}
Is there any was to get the output like this in GBQ ?? (basically flatten the...
1
votes
1
answer
38
Views
How to extract separate values from GeoJSON in BigQuery
I have a GeoJSON string for a multipoint geometry. I want to extract each of those points to a table of individual point geometries in BigQuery
I have been able to achieve point geometry for one of the points. I want to do it for all the others as well in a automated fashion. I've already tried conv...
1
votes
4
answer
37
Views
How do you query an array in Standard SQL that meets a certain conditional?
I am trying to pull records whose arrays only meet a certain condition.
For example, I want only the results that contain 'IAB3'.
Here is what the table looks like
Table Name:
bids
Column Names:
BidderBanner / WinCat
Entries:
1600402 / null
1911048 / null
1893069 / [IAB3-11, IAB3]
1214894 / IAB3
How...
1
votes
3
answer
49
Views
Find maximas and minima of time series values using SQL
I have a certain set of index values that increase and decrease over time . I wish to identify the time periods during which values rise and values fall. The data looks like this:
I tried partitioning the values by the range and I definitely don't think I'm doing it right. Here's the query I wrote w...
1
votes
2
answer
40
Views
Using the append model to do partial row updates in BigQuery
Suppose I have the following record in BQ:
id name age timestamp
1 'tom' 20 2019-01-01
I then perform two 'updates' on this record by using the streaming API to 'append' additional data -- https://cloud.google.com/bigquery/streaming-data-into-bigquery. This i...
0
votes
1
answer
12
Views
Connecting Spreadsheet to BigQuery
I want to connect a Google Spreadsheet to a new BigQuery table that populates and update the data automatically.
I'm using this tutorial to do the setup.
My problem, I had to configure each column manually and the table went empty so I have to query it to another table to bring the data.
I'm not exp...
1
votes
1
answer
91
Views
Loading Avro Data into BigQuery via command-line?
I have created an avro-hive table and loaded data into avro-table from another table using hive insert-overwrite command.I can see the data in avro-hive table but when i try to load this into bigQuery table, It gives an error.
Table schema:-
CREATE TABLE `adityadb1.gold_hcth_prfl_datatype_accepten...
1
votes
0
answer
286
Views
Google Cloud Dataprep - Scan for multiple input csv and create corresponding bigquery tables
I have several csv files on GCS which share the same schema but with different timestamps for example:
data_20180103.csv
data_20180104.csv
data_20180105.csv
I want to run them through dataprep and create Bigquery tables with corresponding names. This job should be run everyday with a scheduler.
Righ...
1
votes
0
answer
305
Views
Dynamically write to tables in Dataflow
Working on a pipeline in Dataflow. I need write values to multiple big query table where the desired table names are values in a PCollection.
For example with class Data as:
public class Data{
public List tableName;
public String id;
public String value;
}
I will have a PCollection and i would like...
1
votes
0
answer
119
Views
Can't run App Engine locally with BigQuery
When I import BigQuery:
from google.cloud import bigquery
I get the following error:
Traceback (most recent call last):
File '/Users/manuelgodoy/Projects/Klein/src/application/storage/bigquery_models.py', line 6, in
from google.cloud import bigquery
File '/Library/Frameworks/Python.framework/Versio...
1
votes
1
answer
108
Views
Options to load data into Big Query from Google Cloud Storage pro-grammatically in Java?
I have been searching for loading data into Big Query programmatically from Google Cloud Storage. I have done this manually by taking backup of my Google Cloud Storage of one Kind and dumping it into the BigQuery Table and was able to retrive data in android as well. The only problem i am facing is...
1
votes
1
answer
51
Views
How to integrate Google Cloud Platform into my company's iOS app [closed]
My company currently stores real time manufacturing data (< 10 GigaBytes) locally in Microsoft SQL Server. We would like to push this data to the cloud and serve it to US clients in an iOS app, preferably in real-time.
I have experience with Firebase and Cloud Functions, but not Google Cloud. What...
1
votes
0
answer
310
Views
Is there a tool to efficiently export a BigQuery table to BigTable?
Is there a tool to efficiently export a BigQuery table or query result to a BigTable table?
Ideally this would be a single Dataflow program that did a BigQuery query on a table and wrote the results to a BigTable table, with a designated key column, and corresponding column names for all the fields...
1
votes
1
answer
345
Views
Updating Repeated Array Struct BigQuery
I have (apparently with 104559 rows affected) successfully updated an element of my session_user repeated field (within the table definition see below) with:
UPDATE genderfitnessdev.gfa_talend_dev.gfa_employment_cu
set session_user = ARRAY (
SELECT AS STRUCT * REPLACE('399975' as level_0)
FROM UNNES...