Questions tagged [amazon-redshift]

1

votes
3

answer
36

Views

Run a JOIN statement that excludes duplicate rows

I have a table with duplicated entries (they have the same uid). I want to remove the duplicates from the query result by using a GROUP BY clause (one of the entries is valid, but it is random. I can only find out what the value is by joining it with db2 via rid. I am using an aggregate function (MA...
Sebastian
1

votes
1

answer
41

Views

Union all of 200 select statements failing to even execute. No error thrown. Limitation of number of select statements in union all?

Because of the limitations we have in Amazon Redhshift SQL (which is based on PostgreSQL 8.0.2). I am forced to execute the following query for some other complex query purposes: create temporary table NS AS ( select 1 as n union all select 2 union all select 3 union all select 4 union all select 5...
Rahul Yadav
1

votes
1

answer
40

Views

Looking to filter query based on last two digits in price column

I'm trying to run an analysis that looks at conversion changes based on price point. To do this, I am trying to bucket prices based on last two digits. How would i go about filtering based on the last two digits in the where clause? I'll want to filter prices ending in .99, (.25,50,.75,), .%0, etc.
d.tang
1

votes
1

answer
56

Views

Enabling Encryption on a Redshift Cluster with existing data

I've been charged with enabling encryption on a Redshift cluster which has a significant amount of existing data. Based on this link I know that when enabled it will create a new cluster and copy the existing data across making access to it during this time readonly. We have a number of ETL jobs tha...
Paddy
1

votes
0

answer
676

Views

“Query compilation failed” for redshift query

I'm using psql (postgresql 10.1) to access a table on AWS/redshift. The following query gives me a very cryptic error: => select to_timestamp(json_extract_path_text(data, 'scheduling_at'), 'YYYY-MM-DD'T'HH24:MI:SS.MS'Z'') as ts from my_table; ERROR: Error: DETAIL: ----------------------------------...
Luis
1

votes
0

answer
75

Views

Splitting comma separated values, in more than one columns, into rows

I'm using sql query on redshift I have a table(@tA) having more than one columns with comma separated values - create table* @tA ( col1 varchar(100), col2 varchar(100), col3 varchar(100) ) @tA - col1 | col2 | col3 a1 b1 c1 a2 b2,b3 c2,c3,c4 a3 b4 c5 I want the...
Durgesh panwar
1

votes
1

answer
80

Views

AWS Redshift failed to make a valid plan when trying to run a complicated query

I'm running a complicated query against a Redshift cluster in which there are 4 tables used with some of them have billions of rows, and I get the following error: failed to make a valid plan If I limit the data, the query will run successfully.
Cyrus
1

votes
0

answer
82

Views

Create library tld for Redshift

i want to integrate in Redshift tld utilities defined here : https://github.com/barseghyanartur/tld I think i have to create a module as it is explained here : https://docs.aws.amazon.com/redshift/latest/dg/udf-python-language-support.html but i'm a newbie in python. How i can do that?
kalaoke
1

votes
0

answer
174

Views

Recreating a varbinary hex string in AWS Redshift via Python UDF

I have a SQL Server database where uniqueidentifiers are converted and stored as varbinary(8) i.e. DECLARE @OrganisationID uniqueidentifier = '910D514A-8706-4BA1-9327-FE92EF4165E3'; SELECT CONVERT(varbinary(8), @OrganisationID, 1); /* 0x4A510D910687A14B */ I now need to recreate these hexadecimal re...
fez
1

votes
0

answer
392

Views

Redshift External tables via Hive metastore

I've a redshift DB setup and we do periodic archival of the data into S3. I would like to create redshift external tables on top of these archived files. AWS documentation suggests that this can be done either via athena or via hive metastore. Since athena is quite expensive, I would like to get thi...
Sneha
1

votes
2

answer
280

Views

WITH Clause SQL throws error when deleting rows but works fine for select statment

with de_duplicate (ad_id, id_type, lat, long) AS ( select ad_id, id_type, lat, long, Row_Number() over(partition by ad_id,id_type, lat, long) AS duplicate_count from tempschema.temp_test) select * from de_duplicate; Above runs successful but when I try to perform a delete operation with de_duplicat...
venkatesh Mora
1

votes
0

answer
142

Views

Correlated sub Query in Amazon redshift

Issue that i am facing is Correlated sub Query working fine in Redshift 1.0.1564 version , But the same query throws Amazon Invalid operation: This type of correlated subquery pattern is not supported due to internal error in Redshift 1.0.1657 version . Any thoughts on this is highly appreciated B...
Surya
1

votes
1

answer
66

Views

IoT Big Data design on AWS

I'm trying to design a big IoT solution of millions of devices starting from zero. That's why I need a highly scalable platform like AWS. My devices are going to report data using AWS IoT, and that's the only thing I've really decided. I need to store a lot of data like a temperature measure every 1...
HdAlabama
1

votes
1

answer
137

Views

SQL Regex number not followed by a string

Let me first mention that this a well discussed problem and I have gone through several thread including these two - which are closest match Regex to match a string not followed by some string and A regex to match a substring that isn't followed by a certain other substring but they did not solve my...
Ali
1

votes
0

answer
74

Views

Unable to write back to redshift using r from tableau

I have a requirement of using r code and writing back values of tableau parameters to database. I am using below code for this purpose. Given code works perfectly fine on R, however, tableau throws popup of start time to run the table calculation and never ends(waited for 10 mins max). Below is my c...
Vaibhav
1

votes
1

answer
83

Views

Conditional time to status calculation

I am trying to calculate how long it takes a rep to have x amount of clients apply for service: meaning I need the time between date_created - ie. date the rep was onboarded, and when rep reaches a certain 'status'. Status is reached when x of the rep's clients (= users) have a non-null date_applied...
user8834780
1

votes
0

answer
33

Views

Assign a Sequence (session ID) to my table based on A value in field

I am manually assigning a 'Session ID' to my results set. I did this by ordering all events and if the time difference between the current and next event is greater than 2 minutes, set the 'session' field to 'New Session'. My results set now looks like this. Table name : tbl_sessions ╔═══...
Steve
1

votes
1

answer
47

Views

Multiple group by statements?

I am trying to group by location by day and need some help with the group by(or multiple group by statements?). The table looks like: location timestamp traffic 1 US 2018-01-31 155 2 EU 2018-01-31 574 3 US 2018-01-30 149 4 EU 2018-01-30 150 5 US 2018-01-30 100 and I am tryi...
hhh_
1

votes
2

answer
159

Views

Get tables from schema

I have one database named dev. Inside it I have schema named test_spect. The schema consists of some tables. The test_spect is not public. How to get the table name and its data from test_spect. When I run \dt test_spect.* It says No matching relations found. How to solve this?
1

votes
1

answer
193

Views

Amazon redshift spectrum under the hood?

I am just curious to know when you run a query in Redshift spectrum what's happening under the hood? Is it running a spark job? or map-reduce job or presto or something totally different?
Am1rr3zA
1

votes
2

answer
598

Views

AWS Lambda Connect to Redshift DB using ODBC Connection

I am trying to connect to RedShift DB Using AWS Lambda from .NEt Core 2.0 C# App. Below is my approach. string connString = 'Driver={Amazon Redshift (x86)};' + String.Format('Server={0};Database={1};' + 'UID={2};PWD={3};Port={4};SSL=true;Sslmode=Require', RedShiftServer, RedShiftDBName, RedShiftUser...
Digambar
1

votes
1

answer
1.8k

Views

Redshift LIKE column value with %

I have a column that is a comma separated string of values. I want to join another table that only has one of the values. On redshift, how can I do a LIKE operator with '%' injected into the comparison? Ex: TableA: values_col = 'abc, def' TableB: value_col = 'def' SELECT * FROM TableA a JOIN TableB...
cvax
1

votes
1

answer
109

Views

Does Talend's tRedshiftUnload component support IAM roles?

I am using Talend Open Studio for Big Data ( Version 6.4.1 ) to UNLOAD a dataset from Amazon Redshift to S3. The UNLOAD operation works fine when S3 Access Key and Secret Key are provided. Is there a way to include the IAM role ARN instead of S3 Access Key and Secret Key to perform this UNLOAD oper...
user_default
1

votes
0

answer
336

Views

Kinesis Firehose with Lambda decorator getting throttled

I am using Firehose with a lambda decorator to ingest vpc flow logs into Redshift. (VPC Flow Logs -> Kinesis Data Stream -> Kinesis Firehose -> Lambda Decorator -> Redshift) The volume of traffic is high which causes the lambda to error out with task timed out when reingesting unprocessed records ba...
kilomo
1

votes
0

answer
78

Views

“?” quantifier being recognized as a parameter when used in a sql posix statement, redshift

I'm trying to run something similar to the below on a table in redshift SELECT * FROM table t WHERE t.pagepath ~ '^www\\.example\\.[a-z\\.]+\\/web(\\/)?(#\\/)?(home.*)?(\\?.*)?$' The pagepath column contains URLs, I am trying to group them based on what part of the site they represent. t...
Jonathan White
1

votes
0

answer
44

Views

Database Connection Issue with RRedshiftSQL/RPostgreSQL

I'm trying to write Redshift queries in R. The documentation calls for something like this: con
RIPHarambe
1

votes
0

answer
140

Views

spark-redshift unload data. Directory name and save in CSV

I got problem with spark-redshift lib from databricks. Actually I have 2 questions. How I can UNLOAD data in .csv format with headers? How I can set CUSTOM directory name ? By that I mean ignore those random numbers INFO RedshiftRelation: UNLOAD ('SELECT 'companynumber', 'companyname' FROM (select c...
Laharos
1

votes
0

answer
255

Views

Redshift Spectrum : Getting no values/ empty while select using Parquet

I have tried using textfile and it works perfectly. I am using Redshift spectrum. To increase performance, I am trying using PARQUET. The table gets created but I get no value returned while firing a Select query. Below are my queries: CREATE EXTERNAL TABLE gf_spectrum.order_headers ( header_id num...
Prajakta Yerpude
1

votes
0

answer
60

Views

SQL query to show user session length

I have a table that looks like this: user_id page happened_at 2 'page3' 2017-10-05 11:31 1 'page2' 2016-02-01 00:02 2 'page1' 2017-10-05 15:24 3 'page3' 2017-03-31 19:35 4 'page1' 2017-07-09 00:24 2 'page3' 2017-1...
Alex Nikitin
1

votes
1

answer
345

Views

Update a base table with a staging table that references the columns to update[REDSHIFT]

I am having trouble with the below example. I have 2 tables, snapshot and staging. Staging contains the following columns: |person_id|column_changed|new_value| |1 | color | orange | |1 | sport | football| Snapshot contains the following columns: |person_id| color| sport...
pippa dupree
1

votes
1

answer
93

Views

SSIS Connection is not working with Amazon Redshift after export the project from SQL Server

I had built a SSIS package which were truncating some tables in Amazon Redshift. I need to have some enhancement, so I have exported the project from SSIS and trying to edit the package. Now old connection is not working and neither it is giving any specific error. I am trying to use same user and...
Ayan Chakraborty
1

votes
1

answer
102

Views

How to get redshift to add current time for a field specified in copy command

I have a TSV file that I want to load into redshift via the copy command. I want one of the fields in the table to be a timestamp that registers the time the row was loaded. I have defined a field like this: ts TIMESTAMP DEFAULT CURRENT_TIMESTAMP This works fine if I insert into this row at the psql...
Bruce
1

votes
2

answer
274

Views

Redshift LISTAGG frame clause

I am trying to aggregate strings, but limited to only the preceding rows, not the whole partition. Does anyone know how to do this in Redshift? What I am trying to achieve is the appended_event_namespace column below. This is what I've tried so far. LISTAGG(event_namespace, '/') WITHIN GROUP (ORDER...
cvax
1

votes
0

answer
238

Views

Amazon Redshift Python UDF with Shapely: permission denied error when trying to use GEOS functions

I have an Amazon Redshift database set up, which contains a table with geographical markers using 3 fields: id, latitude (float), longitude (float). My goal is to write a Python UDF using the shapely library to: Parse a multipolygon in the form of a WKT string, and Return a boolean stating whether t...
user8834864
1

votes
0

answer
182

Views

Kafka Connect from RDS to RedShift not starting

I was able to implement Kafka Connect on a much smaller table but am trying to implement it on a larger database. My source and sink configuration are as followed source: name=rds-source connector.class=io.confluent.connect.jdbc.JdbcSourceConnector table.whitelist=users,places,sales tasks.max=1 conn...
Minh Mai
1

votes
0

answer
46

Views

Window function query on transaction data

This is my ethereum_transaction_receipts table in redshift: CREATE TABLE 'crypto_blockchains'.'ethereum_transaction_receipts' ( 'hash' character varying(256), 'block_number' bigint, 'tx_time' timestamp without time zone, 'event_generator_address' character varying(256), 'event_hash' character varyin...
rajat
1

votes
1

answer
63

Views

How to write two Spark DataFrames to Redshift atomically?

I am using Databricks spark-redshift to write DataFrames to Redshift. I have two DataFrames that get appended to two separate tables, but I need this to happen atomically, i.e. if the second DataFrame fails to write to its table, I'll need the first one to be undone as well. Is there any way to do t...
lfk
1

votes
1

answer
126

Views

Amazon Redshift node parallel requests take longer than sequential

I am trying to run a bunch of select queries on redshift from my node app using node-redshift and pg. If I run my queries in sequence, I get an average of 2 seconds per query. But when I run my queries in parallel, they take much longer, about 32 seconds for the last one. I can clearly see that the...
Achshar
1

votes
1

answer
226

Views

Unable to import module 'copy': /var/task/psycopg2/_psycopg.so: ELF file's phentsize not the expected size

I am following - https://github.com/christianhxc/aws-lambda-redshift-copy When I tried to test it, I am running into error Unable to import module 'copy': /var/task/psycopg2/_psycopg.so: ELF file's phentsize not the expected size This is the file structure in AWS. Any help is appreciated :) Thanks!
Dharmesh
1

votes
0

answer
243

Views

JSONPaths file: Parse a JSON object contained within a JSON array

I have rows of the following JSON form: [ { 'id': 1, 'costs': [ { 'blue': 100, 'location':'courts', 'sport': 'football' } ] } ] I want to upload this into a redshift table as follows: id | blue | location | sport --------+------+---------+------ 1 | 100 | courts |football The following JSONP...
pippa dupree

View additional questions