Running Stache on AWS with Stache Serverless

I’ve released Stache Serverless - a production-ready way to run Stache on AWS infrastructure.

Architecture

The stack uses:

Lambda - FastAPI via Mangum for serverless compute
S3 Vectors - Vector database (GA December 2025)
DynamoDB - Metadata and namespace storage
Bedrock - Claude 3.5 Sonnet LLM + Cohere embeddings
BedrockAgentCore Gateway - OAuth + MCP integration

S3 Vectors in Production

I’ve been running S3 Vectors since it hit GA. Here’s my honest assessment:

What works well:

Sub-100ms query performance, tested to 100k vectors
Zero outages or data loss
~$25/month for 100k vectors + 1M queries
Simple boto3 API with IAM authentication

Limitations:

2KB metadata limit per filterable key
list_vectors lacks metadata filtering
Sparse documentation and limited community resources
No cross-region replication

Performance

Cold start: 2-3 seconds
Warm Lambda: 100-200ms
RAG pipeline: 350ms ingestion, ~3.5s with synthesis
Monthly cost: ~$208 for 100k documents and 1M requests

Design Decisions

The system uses provider patterns for swappable components - you can swap out vector stores, LLMs, or embedding providers without changing the core code. Auto-split embeddings handle token limits automatically, and middleware plugins provide extensibility.

Deployment

Everything deploys via SAM template. You can also run local development against real AWS services - no mocking required.

Check out stache-serverless on GitHub and the full discussion for details.