We run Lambda functions in production across multiple client accounts. Some handle 50 requests per day. Others handle 50 million. The best practices that matter depend heavily on scale, but the fundamentals apply everywhere.
This is not a rehash of the AWS documentation. These are the practices we actually follow, the mistakes we have fixed, and the patterns that work in 2026.
Cold Start Optimisation
Cold starts are Lambda’s biggest practical limitation. A cold start happens when Lambda creates a new execution environment — loading your code, initialising the runtime, and running your handler for the first time.
Actual Cold Start Times (2026 Benchmarks)
| Runtime | Cold Start (p50) | Cold Start (p99) | Notes |
|---|---|---|---|
| Python 3.13 | 200-400ms | 800ms-1.2s | Fastest scripting runtime |
| Node.js 22 | 200-350ms | 600ms-1s | Good general choice |
| Go | 50-100ms | 150-250ms | Near-zero cold starts |
| Rust | 50-80ms | 100-200ms | Fastest overall |
| Java 21 | 2-5s | 6-10s | Without SnapStart |
| Java 21 + SnapStart | 90-140ms | 200-400ms | Dramatically better |
| .NET 8 (Native AOT) | 200-400ms | 500-800ms | AOT required for good performance |
Fix 1: Use ARM64 (Graviton2)
Switch every function to ARM64. It is a one-line change that improves both performance and cost:
# SAM template
Resources:
MyFunction:
Type: AWS::Serverless::Function
Properties:
Runtime: python3.13
Architectures:
- arm64 # 20% cheaper, 15-40% faster
Handler: app.handler
ARM64 functions are 20% cheaper per GB-second and run 15-40% faster than x86 equivalents. There is no reason to use x86 for new Lambda functions in 2026 unless you have a compiled dependency that does not support ARM.
Fix 2: Minimise Package Size
Every megabyte of deployment package adds to cold start time. The runtime has to download, decompress, and load your code:
# Python: Use Lambda layers for large dependencies
# Bad: 50MB deployment package
pip install pandas numpy scipy -t ./package/
# Good: Split into layer (loaded once, cached)
pip install pandas numpy scipy -t ./layer/python/
zip -r layer.zip layer/
aws lambda publish-layer-version \
--layer-name data-deps \
--zip-file fileb://layer.zip \
--compatible-runtimes python3.13
// Node.js: Tree-shake with esbuild
// Bad: node_modules with 200MB of unused code
// Good: Bundled to a single file
// esbuild config
import { build } from 'esbuild';
await build({
entryPoints: ['src/handler.ts'],
bundle: true,
minify: true,
platform: 'node',
target: 'node22',
outfile: 'dist/handler.js',
external: ['@aws-sdk/*'], // AWS SDK v3 is included in runtime
});
Fix 3: Initialise Outside the Handler
Code outside the handler function runs once during cold start and is reused for subsequent invocations:
# Good: Connection created once, reused across invocations
import boto3
import psycopg2
import os
# These run ONCE during cold start
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(os.environ['TABLE_NAME'])
# Database connection pool (reused)
conn = psycopg2.connect(
host=os.environ['DB_HOST'],
dbname=os.environ['DB_NAME'],
user=os.environ['DB_USER'],
password=os.environ['DB_PASSWORD']
)
def handler(event, context):
# This runs on EVERY invocation
# conn and table are already initialised
result = table.get_item(Key={'id': event['id']})
return {'statusCode': 200, 'body': result}
Fix 4: Use SnapStart for Java
If you run Java on Lambda, SnapStart is mandatory. It creates a snapshot of the initialised execution environment, reducing cold starts from 5-10 seconds to 90-140ms:
Resources:
JavaFunction:
Type: AWS::Serverless::Function
Properties:
Runtime: java21
Architectures: [arm64]
SnapStart:
ApplyOn: PublishedVersions
Fix 5: Provisioned Concurrency for Latency-Critical Paths
For user-facing APIs where cold starts are unacceptable, pre-warm execution environments:
Resources:
ApiFunction:
Type: AWS::Serverless::Function
Properties:
AutoPublishAlias: live
ProvisionedConcurrencyConfig:
ProvisionedConcurrentExecutions: 10
Cost warning: Provisioned Concurrency charges whether the functions are invoked or not. At 10 concurrent instances, you pay approximately $80-120/month. Only use this for latency-critical, high-traffic endpoints. For everything else, accept the occasional cold start.
Memory and Performance Tuning
Lambda allocates CPU proportionally to memory. More memory = more CPU = faster execution. Sometimes increasing memory reduces both execution time AND cost:
The Memory-Cost Experiment
Run your function at different memory settings and measure:
# Use AWS Lambda Power Tuning (open-source tool)
# Deploy: https://github.com/alexcasalboni/aws-lambda-power-tuning
# It runs your function at 128MB, 256MB, 512MB, 1024MB, etc.
# and charts execution time vs cost
# Common finding:
# 128MB: 3000ms execution, $0.0000625 per invocation
# 512MB: 800ms execution, $0.0000667 per invocation (7% more expensive, 4x faster)
# 1024MB: 400ms execution, $0.0000667 per invocation (same cost, 7.5x faster!)
The sweet spot: For most functions, 512MB-1024MB provides the best cost-performance ratio. Going below 256MB rarely saves money because execution time increases proportionally.
Right-Size with AWS Compute Optimiser
Enable Compute Optimiser for Lambda — it analyses your functions and recommends memory settings based on actual usage patterns. This is free and often identifies functions that are either over-provisioned or under-provisioned.
Architecture Patterns
Pattern 1: API Gateway + Lambda (Synchronous)
The standard serverless API pattern:
Client → API Gateway → Lambda → DynamoDB/RDS
→ Return response
Best practices:
- Use API Gateway HTTP APIs (not REST APIs) — 71% cheaper, lower latency
- Enable response caching for read-heavy endpoints
- Use Lambda Proxy integration for simpler code
- Set appropriate timeouts (API Gateway: 29s max, Lambda: match or lower)
Pattern 2: Event-Driven Processing (Asynchronous)
S3 Upload → SQS Queue → Lambda → Process → Store result
SNS Topic → Lambda → Send notification
EventBridge → Lambda → Scheduled task
Best practices:
- Always use SQS between event source and Lambda for buffering and retry
- Configure Dead Letter Queues (DLQ) for failed messages
- Set
maxBatchingWindowto batch events and reduce invocations - Use
ReservedConcurrentExecutionsto prevent downstream overload
Resources:
ProcessorFunction:
Type: AWS::Serverless::Function
Properties:
Events:
SQSTrigger:
Type: SQS
Properties:
Queue: !GetAtt ProcessingQueue.Arn
BatchSize: 10
MaximumBatchingWindowInSeconds: 5
ReservedConcurrentExecutions: 50 # Protect downstream services
DeadLetterQueue:
Type: SQS
TargetArn: !GetAtt DLQ.Arn
Pattern 3: Fan-Out (Parallel Processing)
Input → Lambda (coordinator) → SQS → N × Lambda (workers) → Aggregate
Use Step Functions for complex orchestration with error handling, retries, and parallel execution branches. Lambda alone cannot coordinate multi-step workflows reliably.
Observability
Lambda functions are black boxes without proper observability. Here is our standard setup:
Structured Logging
import json
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
def handler(event, context):
# Structured JSON logging
logger.info(json.dumps({
"message": "Processing request",
"request_id": context.aws_request_id,
"function_name": context.function_name,
"memory_limit": context.memory_limit_in_mb,
"event_source": event.get("source", "unknown"),
}))
# Your logic here
result = process(event)
logger.info(json.dumps({
"message": "Request completed",
"request_id": context.aws_request_id,
"items_processed": len(result),
}))
Metrics with CloudWatch Embedded Metrics Format
from aws_embedded_metrics import metric_scope
@metric_scope
def handler(event, context, metrics):
metrics.set_namespace("MyApp")
metrics.put_dimensions({"Service": "OrderProcessor"})
start = time.time()
result = process_order(event)
duration = time.time() - start
metrics.put_metric("ProcessingDuration", duration, "Seconds")
metrics.put_metric("OrdersProcessed", 1, "Count")
if result.get("error"):
metrics.put_metric("ProcessingErrors", 1, "Count")
Distributed Tracing
Enable X-Ray tracing for every function. It adds 1-2ms overhead but gives you end-to-end visibility:
Globals:
Function:
Tracing: Active
Resources:
MyFunction:
Type: AWS::Serverless::Function
Properties:
Policies:
- AWSXRayDaemonWriteAccess
For more advanced observability, integrate with Prometheus and Grafana via CloudWatch metric streams.
Security Best Practices
Least-Privilege IAM
Every Lambda function gets its own IAM role with only the permissions it needs:
# Bad: Wildcard permissions
Policies:
- Statement:
- Effect: Allow
Action: "dynamodb:*"
Resource: "*"
# Good: Specific actions on specific resources
Policies:
- Statement:
- Effect: Allow
Action:
- dynamodb:GetItem
- dynamodb:PutItem
Resource: !GetAtt OrdersTable.Arn
Secrets Management
Never put secrets in environment variables as plaintext. Use AWS Secrets Manager or Parameter Store with the Lambda extension:
import json
import boto3
from functools import lru_cache
@lru_cache(maxsize=1)
def get_secret(secret_name):
client = boto3.client('secretsmanager')
response = client.get_secret_value(SecretId=secret_name)
return json.loads(response['SecretString'])
# Called once, cached for the lifetime of the execution environment
db_credentials = get_secret('prod/db-credentials')
VPC Considerations
Only put Lambda in a VPC if it needs to access VPC resources (RDS, ElastiCache, internal services). VPC-attached Lambdas used to have terrible cold starts, but Hyperplane ENI has mostly resolved this. Still, avoid VPC unless necessary.
Cost Optimisation
1. Right-Size Memory
As covered above, more memory is often cheaper because execution is faster. Use AWS Lambda Power Tuning to find the optimal setting.
2. Use ARM64 Everywhere
20% price reduction, no code changes for most runtimes. This is free money.
3. Batch Events
Process multiple records per invocation instead of one. SQS batching reduces invocation count by 10x:
Events:
SQSTrigger:
Type: SQS
Properties:
BatchSize: 10 # Process 10 messages per invocation
MaximumBatchingWindowInSeconds: 30 # Wait up to 30s to fill the batch
4. Avoid Lambda for Steady-State
If a function runs continuously (invoked every second, 24/7), consider moving it to Kubernetes or Fargate. Lambda’s per-invocation pricing loses to always-on compute at sustained high throughput.
5. Monitor with Cost Allocation Tags
Tag every function with team, project, and environment:
Resources:
MyFunction:
Type: AWS::Serverless::Function
Properties:
Tags:
Team: payments
Project: order-processing
Environment: production
Then track costs per tag in AWS Cost Explorer.
Building Serverless on AWS?
We design and deploy serverless architectures on AWS — from single-function APIs to complex event-driven systems processing millions of events daily.
Our AWS managed services cover:
- Serverless architecture design — API Gateway, Lambda, DynamoDB, SQS, Step Functions
- Performance optimisation — cold start reduction, memory tuning, provisioned concurrency
- Observability setup — structured logging, X-Ray tracing, custom CloudWatch dashboards
- Cost optimisation — right-sizing, ARM64 migration, batch processing
- Hybrid architecture — combine Lambda with EKS for optimal cost and performance