Recently, I conducted a series of experiments to determine the fastest approach for uploading and downloading files smaller than 100MB to and from S3 within AWS Lambda functions. The results were quite interesting, and I thought I’d share my findings with you. Let’s dive into the details of these experiments and see what we can learn about optimizing S3 file transfers in Lambda.
Download Experiments
I tested three different methods for downloading files from S3 to Lambda. Here’s what I found:
Method 1: Get Object with Gzip Decompression
def download_get_object_with_gzip(bucket, key):
response = s3.get_object(Bucket=bucket, Key=key)
buffer = io.BytesIO(response['Body'].read())
try:
return GzipFile(None, 'rb', fileobj=buffer).read()
except OSError:
buffer.seek(0)
return buffer.read()
Performance:
- Cold start: 275ms – 245ms
- Warm start: 244ms – 245ms
- Average: 252.25ms
Method 2: Download Fileobj
def download_download_fileobj(bucket, key):
buffer = io.BytesIO()
s3.download_fileobj(Bucket=bucket, Key=key, Fileobj=buffer)
buffer.seek(0)
return buffer.getvalue()
Performance:
- Cold start: 390ms – 349ms
- Warm start: 185ms – 166ms
- Average: 272.5ms
Method 3: Optimized Download
def optimized_download(bucket, key):
response = s3.get_object(Bucket=bucket, Key=key)
return response['Body'].read()
Performance:
- Cold start: 283ms – 251ms
- Warm start: 245ms – 245ms
- Average: 256ms
Upload Experiments
For uploading files from Lambda to S3, I tested four different methods. Here’s what I discovered:
Method 1: Put Object
def upload_put_object(file_path, bucket, key):
with open(file_path, 'rb') as file:
s3.put_object(Body=file, Bucket=bucket, Key=key)
Performance:
- Cold start: 344ms
- Warm start: 264ms
- Average: 304ms
Method 2: Upload File
def upload_upload_file(file_path, bucket, key):
s3.upload_file(file_path, bucket, key)
Performance:
- Cold start: 438ms
- Warm start: 387ms
- Average: 412.5ms
Method 3: Multipart Upload
def upload_multipart(file_path, bucket, key, part_size=8*1024*1024):
file_size = os.path.getsize(file_path)
mpu = s3.create_multipart_upload(Bucket=bucket, Key=key)
try:
parts = []
uploaded_bytes = 0
part_num = 1
with open(file_path, 'rb') as f:
while uploaded_bytes < file_size:
part_data = f.read(part_size)
part = s3.upload_part(Body=part_data, Bucket=bucket, Key=key,
UploadId=mpu['UploadId'], PartNumber=part_num)
parts.append({"PartNumber": part_num, "ETag": part['ETag']})
uploaded_bytes += len(part_data)
part_num += 1
s3.complete_multipart_upload(Bucket=bucket, Key=key,
UploadId=mpu['UploadId'],
MultipartUpload={"Parts": parts})
except ClientError as e:
s3.abort_multipart_upload(Bucket=bucket, Key=key, UploadId=mpu['UploadId'])
raise e
Performance:
- Cold start: 592ms
- Warm start: 546ms
- Average: 569ms
Method 4: Fast Upload
def fast_upload(file_path, bucket, key, workers=20):
session = boto3.Session()
botocore_config = botocore.config.Config(max_pool_connections=workers)
s3client = session.client('s3', config=botocore_config)
transfer_config = s3transfer.TransferConfig(
use_threads=True,
max_concurrency=workers,
)
s3t = s3transfer.create_transfer_manager(s3client, transfer_config)
s3t.upload(file_path, bucket, key)
s3t.shutdown()
Performance:
- Cold start: 611ms
- Warm start: 636ms
- Average: 623.5ms
Conclusions
After running these experiments, we can draw some interesting conclusions about the performance of different S3 file transfer methods in AWS Lambda:
- For downloads, the “Optimized Download” method (Method 3) seems to offer the best balance of performance across cold and warm starts, with an average time of 256ms.
- For uploads, surprisingly, the simple “Put Object” method (Method 1) outperformed the other approaches, with an average time of 304ms.
- The more complex methods, such as multipart upload and the “Fast Upload” using transfer managers, actually performed slower in this context. This might be due to the overhead of setting up these processes for relatively small files (under 100MB). However, these methods are expected to perform better for larger files.
- There’s a noticeable difference between cold and warm start times, especially for the upload methods. This highlights the importance of considering Lambda’s execution context in your application design.
It’s worth noting that these results may vary depending on factors such as file size, network conditions, and Lambda configuration. Always test in your specific environment to ensure optimal performance.
That’s it! Who would have thought the simple “Put Object” method would outshine its more complex counterparts? Again, these results might vary in your specific setup, so don’t shy away from running your own tests.