Optimizing S3 File Transfers in AWS Lambda: A Performance Comparison

Recently, I conducted a series of experiments to determine the fastest approach for uploading and downloading files smaller than 100MB to and from S3 within AWS Lambda functions. The results were quite interesting, and I thought I’d share my findings with you. Let’s dive into the details of these experiments and see what we can learn about optimizing S3 file transfers in Lambda.

Download Experiments

I tested three different methods for downloading files from S3 to Lambda. Here’s what I found:

Method 1: Get Object with Gzip Decompression

def download_get_object_with_gzip(bucket, key):
    response = s3.get_object(Bucket=bucket, Key=key)
    buffer = io.BytesIO(response['Body'].read())
    try:
        return GzipFile(None, 'rb', fileobj=buffer).read()
    except OSError:
        buffer.seek(0)
        return buffer.read()

Performance:

  • Cold start: 275ms – 245ms
  • Warm start: 244ms – 245ms
  • Average: 252.25ms

Method 2: Download Fileobj

def download_download_fileobj(bucket, key):
    buffer = io.BytesIO()
    s3.download_fileobj(Bucket=bucket, Key=key, Fileobj=buffer)
    buffer.seek(0)
    return buffer.getvalue()

Performance:

  • Cold start: 390ms – 349ms
  • Warm start: 185ms – 166ms
  • Average: 272.5ms

Method 3: Optimized Download

def optimized_download(bucket, key):
    response = s3.get_object(Bucket=bucket, Key=key)
    return response['Body'].read()

Performance:

  • Cold start: 283ms – 251ms
  • Warm start: 245ms – 245ms
  • Average: 256ms

Upload Experiments

For uploading files from Lambda to S3, I tested four different methods. Here’s what I discovered:

Method 1: Put Object

def upload_put_object(file_path, bucket, key):
    with open(file_path, 'rb') as file:
        s3.put_object(Body=file, Bucket=bucket, Key=key)

Performance:

  • Cold start: 344ms
  • Warm start: 264ms
  • Average: 304ms

Method 2: Upload File

def upload_upload_file(file_path, bucket, key):
    s3.upload_file(file_path, bucket, key)

Performance:

  • Cold start: 438ms
  • Warm start: 387ms
  • Average: 412.5ms

Method 3: Multipart Upload

def upload_multipart(file_path, bucket, key, part_size=8*1024*1024):
    file_size = os.path.getsize(file_path)
    mpu = s3.create_multipart_upload(Bucket=bucket, Key=key)
    
    try:
        parts = []
        uploaded_bytes = 0
        part_num = 1
        with open(file_path, 'rb') as f:
            while uploaded_bytes < file_size:
                part_data = f.read(part_size)
                part = s3.upload_part(Body=part_data, Bucket=bucket, Key=key, 
                                      UploadId=mpu['UploadId'], PartNumber=part_num)
                parts.append({"PartNumber": part_num, "ETag": part['ETag']})
                uploaded_bytes += len(part_data)
                part_num += 1

        s3.complete_multipart_upload(Bucket=bucket, Key=key, 
                                     UploadId=mpu['UploadId'],
                                     MultipartUpload={"Parts": parts})
    except ClientError as e:
        s3.abort_multipart_upload(Bucket=bucket, Key=key, UploadId=mpu['UploadId'])
        raise e

Performance:

  • Cold start: 592ms
  • Warm start: 546ms
  • Average: 569ms

Method 4: Fast Upload

def fast_upload(file_path, bucket, key, workers=20):
    session = boto3.Session()
    botocore_config = botocore.config.Config(max_pool_connections=workers)
    s3client = session.client('s3', config=botocore_config)
    transfer_config = s3transfer.TransferConfig(
        use_threads=True,
        max_concurrency=workers,
    )
    s3t = s3transfer.create_transfer_manager(s3client, transfer_config)
    s3t.upload(file_path, bucket, key)
    s3t.shutdown()

Performance:

  • Cold start: 611ms
  • Warm start: 636ms
  • Average: 623.5ms

Conclusions

After running these experiments, we can draw some interesting conclusions about the performance of different S3 file transfer methods in AWS Lambda:

  1. For downloads, the “Optimized Download” method (Method 3) seems to offer the best balance of performance across cold and warm starts, with an average time of 256ms.
  2. For uploads, surprisingly, the simple “Put Object” method (Method 1) outperformed the other approaches, with an average time of 304ms.
  3. The more complex methods, such as multipart upload and the “Fast Upload” using transfer managers, actually performed slower in this context. This might be due to the overhead of setting up these processes for relatively small files (under 100MB). However, these methods are expected to perform better for larger files.
  4. There’s a noticeable difference between cold and warm start times, especially for the upload methods. This highlights the importance of considering Lambda’s execution context in your application design.

It’s worth noting that these results may vary depending on factors such as file size, network conditions, and Lambda configuration. Always test in your specific environment to ensure optimal performance.

That’s it! Who would have thought the simple “Put Object” method would outshine its more complex counterparts? Again, these results might vary in your specific setup, so don’t shy away from running your own tests.