Transfer data from AWS to GCP using Storage Transfer Service

Overview

Storage Transfer Service enables users to quickly and securely transfer data to, from, and between object and file storage systems, including Google’s Cloud Storage, Amazon S3, Azure Blob Storage, and on-premises data.

This blog walks you through the process of transferring data from AWS S3 to Google’s Cloud Storage in a secure manner using identity federation. 

Identity Federation creates a trust relationship between Google Cloud and AWS. It allows you to access resources directly, using a short-lived access token, and eliminates the maintenance and security burden associated with long-term credentials such as the service account keys. Using Identity federation, you do not have to worry about rotating keys or explicitly revoking the keys when Storage Transfer Service is not in use.

Steps to configure storage transfer job to transfer data from AWS S3 to GCS

This section walks you through the process to set up infrastructure to transfer data from Amazon Web Services to Google Cloud securely. 

Configurations on Google Cloud 

curl --location --request GET 'https://storagetransfer.googleapis.com/v1/googleServiceAccounts/<Replace Project number>' --header 'X-Goog-User-Project: <Replace Project ID>' --header 'Authorization: Bearer <token>'

Replace project number, project ID and token.

The output of this command will be in the format:

{
  "accountEmail": "<service account email>",
  "subjectId": "<service account subject ID>"
}

NOTE: Make a note of the subjectId as this will be used in the AWS IAM role trust relationship policy.

Configurations on Amazon Web Services

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "s3:Get*",
                "s3:List*",
                "s3:Delete*"
            ],
            "Resource": "*"
        }
    ]
}

NOTE: This policy can be further restricted to a single S3 bucket. s3:GetBucketLocation permission will be needed to fetch the object location.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Principal": {
                "Federated": "accounts.google.com"
            },
            "Action": "sts:AssumeRoleWithWebIdentity",
            "Condition": {
                "StringEquals": {
                    "accounts.google.com:sub": "<Replace with Google Cloud service account subject ID>"
                }
            }
        }
    ]
}

Storage transfer job configuration

Custom scheduler for Storage Transfer Service

STS currently supports a minimum sync schedule of 1 hour. Triggering a Cloud scheduler via Cloud Functions is a work around technique to reduce the sync schedule to minutes/custom schedule.

Event-driven STS for Cloud Storage

Storage Transfer Service now offers event-driven transfer, a serverless and easy-to-use replication service. STS can listen to event notifications in AWS or Google Cloud to automatically transfer data that has been added or updated in the source location. Event-driven transfers are supported from AWS S3 or Cloud Storage to Cloud Storage.

This feature is a good fit for use cases where you have a changing data source (e.g., new object insertion) that needs to be replicated to the destination in a matter of minutes.

You can trigger an event driven replication from AWS S3 to Google Cloud for ongoing data analytics and/or machine learning. 

Event driven configuration on Storage Transfer Service:

The image below indicates a new file being transferred from AWS S3 to Cloud Storage via event driven STS:

Enabling AWS Event Notifications for SQS

On AWS S3: 

Once the setup is complete, the S3 bucket should be enabled to deliver notifications to the configured SQS queue. And the configured role should be able to access both SQS queue and S3 bucket for event-driven transfers.

On AWS SQS:

{
  "Version": "2012-10-17",
  "Id": "example-ID",
  "Statement": [
    {
      "Sid": "example-statement-ID",
      "Effect": "Allow",
      "Principal": {
        "Service": "s3.amazonaws.com"
      },
      "Action": "SQS:SendMessage",
      "Resource": "arn:aws:sqs:us-east-1:123456789101:test-queue",
      "Condition": {
        "StringEquals": {
          "aws:SourceAccount": "123456789101"
        },
        "ArnLike": {
          "aws:SourceArn": "arn:aws:s3:*:*:source-bucket"
        }
      }
    }
  ]
}

Note: Replace SQS ARN, Source account number and S3 bucket ARN 

 This completes the event driven set up for Storage Transfer Service.

Summary

In this article, Google used Storage Transfer Service to securely transfer data from AWS S3 to Google’s Cloud Storage. Google also discussed the event-driven STS feature that can listen to event notifications in AWS to automatically transfer data that has been added or updated in the source location.

Related posts

Introducing vertical autoscaling for batch Dataflow Prime jobs

by Cloud Ace Indonesia
11 months ago

Maintenance made flexible: Cloud SQL launches self-service maintenance

by Kartika Triyanti
2 years ago

A Simple Framework for Plotting Your Cloud Migration

by Cloud Ace Indonesia
2 years ago