Running Stache on AWS with Stache Serverless
Jonathan Penny ·
I’ve released Stache Serverless - a production-ready way to run Stache on AWS infrastructure.
Architecture
The stack uses:
- Lambda - FastAPI via Mangum for serverless compute
- S3 Vectors - Vector database (GA December 2025)
- DynamoDB - Metadata and namespace storage
- Bedrock - Claude 3.5 Sonnet LLM + Cohere embeddings
- BedrockAgentCore Gateway - OAuth + MCP integration
S3 Vectors in Production
I’ve been running S3 Vectors since it hit GA. Here’s my honest assessment:
What works well:
- Sub-100ms query performance, tested to 100k vectors
- Zero outages or data loss
- ~$25/month for 100k vectors + 1M queries
- Simple boto3 API with IAM authentication
Limitations:
- 2KB metadata limit per filterable key
list_vectorslacks metadata filtering- Sparse documentation and limited community resources
- No cross-region replication
Performance
- Cold start: 2-3 seconds
- Warm Lambda: 100-200ms
- RAG pipeline: 350ms ingestion, ~3.5s with synthesis
- Monthly cost: ~$208 for 100k documents and 1M requests
Design Decisions
The system uses provider patterns for swappable components - you can swap out vector stores, LLMs, or embedding providers without changing the core code. Auto-split embeddings handle token limits automatically, and middleware plugins provide extensibility.
Deployment
Everything deploys via SAM template. You can also run local development against real AWS services - no mocking required.
Check out stache-serverless on GitHub and the full discussion for details.