A practical introduction to formal verification using TLA+ through a file upload idempotency problem.
TLA+ (Temporal Logic of Actions) is a formal specification language for modeling and verifying concurrent and distributed systems. It helps you:
- Model system behavior precisely using mathematical logic
- Find bugs before implementation through exhaustive state exploration
- Verify correctness properties like safety and liveness
- Document assumptions and invariants clearly
TLA+ is particularly powerful for systems where race conditions, edge cases, and complex state interactions can cause subtle bugs.
This demo models a common issue: ensuring file uploads are idempotent (safe to retry). We compare two approaches:
- Buggy: Random UUID keys → duplicates on retry
- Fixed: Content-addressed keys → idempotent uploads
├── specs/
│ ├── Uploads_bug.tla # Models the buggy approach
│ ├── Uploads_bug.cfg # Configuration for model checking
│ ├── Uploads_fixed.tla # Models the fixed approach
│ ├── Uploads_fixed.cfg # Configuration for fixed version
│ └── states/ # TLA+ model checker output
├── api/
│ ├── app.py # FastAPI implementation (both versions)
│ ├── dockerfile # Container for the API
│ └── requirements.txt # Python dependencies
└── docker-compose.yml # Local test environment
- Install VS Code
- Install the TLA+ extension
- The extension includes TLC model checker and PlusCal translator
- Download from TLA+ official site
- Extract and run the toolbox application
- Import specifications as new projects
# Install Java 8+ first
sudo apt install openjdk-11-jdk # Ubuntu/Debian
brew install openjdk@11 # macOS
# Download TLA+ tools
wget https://github.com/tlaplus/tlaplus/releases/download/v1.8.0/tla2tools.jar
# Run model checker
java -cp tla2tools.jar tlc2.TLC Uploads_bug.tla
The Problem:
RetrySame ==
\E c \in Contents, k1, k2 \in Keys :
/\ k1 # k2 \* Different keys
/\ [content |-> c, key |-> k1] \in objects \* Same content exists
/\ objects' = objects \cup {[content |-> c, key |-> k2]} \* Add duplicate
Invariant (should hold but doesn't):
AtMostOneKeyPerContent ==
\A o1, o2 \in objects : o1.content = o2.content => o1.key = o2.key
Result: TLA+ finds a counterexample showing the invariant violation.
The Solution:
UploadDet ==
\E c \in Contents :
/\ objects' = objects \cup {[content |-> c, key |-> c]} \* Key = content
Result: TLA+ verifies the invariant always holds.
cd specs/
# Using VS Code: Open Uploads_bug.tla and run "TLA+: Check model with TLC"
# Or command line:
java -cp tla2tools.jar tlc2.TLC -config Uploads_bug.cfg Uploads_bug.tla
Expected output: Invariant violation found in 2-3 states.
# Using VS Code: Open Uploads_fixed.tla and run "TLA+: Check model with TLC"
# Or command line:
java -cp tla2tools.jar tlc2.TLC -config Uploads_fixed.cfg Uploads_fixed.tla
Expected output: No errors found, invariant holds.
Test the real FastAPI implementation that mirrors the TLA+ models:
# Start MinIO and API
docker-compose up
# Test buggy endpoint (creates duplicates)
curl -X POST -F "file=@somefile.txt" http://localhost:8000/upload_buggy
curl -X POST -F "file=@somefile.txt" http://localhost:8000/upload_buggy # Different key!
# Test fixed endpoint (idempotent)
curl -X POST -F "file=@somefile.txt" http://localhost:8000/upload_hashed
curl -X POST -F "file=@somefile.txt" http://localhost:8000/upload_hashed # Same key
- State space exploration: TLA+ checks all possible execution paths
- Invariants: Properties that should always hold (
AtMostOneKeyPerContent
) - Actions: State transitions (
UploadNew
,RetrySame
,UploadDet
) - Specifications: Complete system behavior (
Spec == Init /\ [][Next]_vars
) - Model checking: Automated verification with concrete constants
We tested both approaches using formal verification:
Approach | Result | Verdict |
---|---|---|
Buggy Version (random UUIDs) | ❌ FAILED | Invariant violated - creates duplicates |
Fixed Version (content hashing) | ✅ PASSED | All properties verified - idempotent uploads |
TLA+ discovered this exact sequence that breaks the system:
- Step 1: System starts empty →
Storage: {}
- Step 2: Upload content "c1" with key "k1" →
Storage: {[content: c1, key: k1]}
- Step 3: Upload SAME content "c1" again, get different key "k2" →
Storage: {[content: c1, key: k1], [content: c1, key: k2]}
Result: Same content now has two different keys, violating our core requirement.
TLA+ mathematically proved that content-based hashing works:
- Step 1: System starts empty →
Storage: {}
- Step 2: Upload content "c1", key becomes "c1" →
Storage: {[content: c1, key: c1]}
- Any retry: Upload "c1" again, still gets key "c1" →
Storage: {[content: c1, key: c1]}
(unchanged!)
Result: Same content always gets the same key - retries are completely safe.
Metric | Buggy Approach | Fixed Approach |
---|---|---|
Idempotency | ❌ Not guaranteed | ✅ Guaranteed |
Storage efficiency | ❌ Creates duplicates | ✅ No duplicates |
Retry safety | ❌ Unsafe | ✅ Completely safe |
Predictability | ❌ Random keys | ✅ Deterministic keys |
This example demonstrates how TLA+ can:
- Catch design flaws before coding (found the UUID bug in 896ms)
- Prove correctness of the fix (mathematical verification)
- Document precise behavior for implementation
- Build confidence in system properties (100% coverage of state space)
TLA+ found a critical flaw that could cause wasted storage costs, production bugs, and poor user experience - all discovered at design time, not in production.
For real systems, TLA+ has found critical bugs in distributed databases, consensus protocols, and cloud services that would be nearly impossible to catch through testing alone.
- Read Leslie Lamport's TLA+ home page
- Try Learn TLA+ interactive tutorial
- Explore TLA+ examples repository