Questions tagged [aws-data-pipeline]

1

votes
0

answer
50

Views

How to run multiple steps in aws data pipeline using aws console

I have a use case of scheduling my spark jobs on EMR. Every time we will be spinning a new cluster and running spark job. I went through documentation provided by aws but those are not extensive enough to give clear picture of how to do it. If any one knows please share the solution with step by ste...
Raghav salotra
1

votes
1

answer
130

Views

Permissions for creating and attaching EBS Volume to an EC2Resource i AWS Data Pipeline

I need more local disk than available to EC2Resources in an AWS Data Pipline. The simplest solution seems to be to create and attach an EBS volume. I have added EC2:CreateVolume og EC2:AttachVolume policies to both DataPipelineDefaultRole and DataPipelineDefaultResourceRole. I have also tried sett...
Knut Hellan
1

votes
1

answer
25

Views

Processing parameters passed to SQL activity in AWS data pipeline

I am working with AWS data pipeline. In this context, I am passing several parameters from pipeline definition to sql file as follows: s3://reporting/preprocess.sql,-d,RUN_DATE=#{@scheduledStartTime.format('YYYYMMdd')}' My sql file looks like below: CREATE EXTERNAL TABLE RESULT ( STUDENT_ID...
Joy
1

votes
1

answer
32

Views

Export existing DynamoDB items to Lambda Function

Is there any AWS managed solution which would allow be to perform what is essentially a data migration using DynamoDB as the source and a Lambda function as the sink? I’m setting up a Lambda to process DynamoDB streams, and I’d like to be able to use that same Lambda to process all the existing...
Matthew Pope
1

votes
1

answer
0

Views

AWS Data Pipeline vs Step Functions

I am working on a problem where we intend to perform multiple transformations on data using EMR (SparkSQL). After going through the documentation of AWS Data Pipelines and AWS Step Functions, I am slightly confused as to what is the use-case each tries to solve. I looked around but did not find a a...
archilius
0

votes
0

answer
10

Views

Copy data from PostgreSQL to S3 using AWS Data Pipeline

I am trying to copy all the tables from a schema (PostgreSQL, 50+ tables) to Amazon S3. What is the best way to do this? I am able to create 50 different copy activities, but is there a simple way to copy all tables in a schema or write one pipeline and loop?
Visss
0

votes
0

answer
2

Views

Has anyone used AWS system manager parameter in data pipeline, to allocate value to a parameter in pipeline?

'id': 'myS3Bucket', 'type': 'String', 'default': '\'aws ssm get-parameters --names variable --query \'Parameters[*].{myS3Bucket:Value}\'\'' I tried this , Where I created a variable in AWS parameter and was able to retrieve the value using this command in AWS CLI, but not able to retrieve the value...
user11204973
0

votes
2

answer
15

Views

AWS to GCP data migration

I'm planning a Data Migration from AWS MySQL instances to GCP BigQuery. I don't want to migrate every MySQL Database because finally I want to create a Data Warehouse using BigQuery. Would exporting AWS MySQL DB to S3 buckets as csv/json/avro, then transfer to GCP buckets be a good option? What woul...
Buddhika Sameera
0

votes
0

answer
3

Views

AWS data pipeline execute redshift query for every line in s3 file

I have partitioned file (few files with same name template) in s3. In every file there are few lines of data. I would like to execute redshift for each line from file while substituting paramters from file line. I need to do it in AWS data pipelint For example file val1,val2,val3 val4,val5,val6 val...
Evgeny Makarov