Practical Engineering Guidelines

Welcome to the software engineering interview Collection. An ongoing curated list of awesome frameworks, important books and articles, talks and videos, libraries, learning tutorials, coding best practices and technical resources about Practical Engineering.

Thanks to our daily readers and contributors. The goal is to build a categorized community-driven collection of very well-known resources. Sharing, suggestions and contributions are always welcome!

`Table of Contents`

Papers

Fault Injection in Production (Allspaw)
Making Reliable Distributed Systems in the Presence of Software Errors (Armstrong)
Highly Available Transactions: Virtues and Limitations (Bailis et al.)
The Incident Command System (Bigley and Roberts)
The Chubby Lock Service for Loosely Coupled Distributed Systems (Burrows)
Bigtable: a Distributed Storage System for Structured Data (Chang et al.)
Spanner: Google’s Globally-Distributed Database (Corbett et al.)
Dynamo: Amazon’s Highly Available Key-Value Store (DeCandia et al.)
MapReduce: Simplified Data Processing on Large Clusters (Dean and Ghemawat)
The Google File System (Ghemawat et al.)
On Designing and Deploying Internet Scale Services (Hamilton)
Kafka: A Distributed Messaging System for Log Processing (Kreps et al.)
Weathering the Unexpected (Krishnan)
The Unified Logging Infrastructure for Data Analytics at Twitter (Lee et al.)
Automatic Management of Partitioned, Replicated Search Services (Leibert et al.)
Learning to Embrace Failure (Limoncelli et al.)
Scaling Big Data Mining Infrastructure: The Twitter Experience (Lin and Rayboy)
Dremel: Interactive Analysis of Web-Scale Datasets (Melnik et al.)
Out of the Tar Pit (Moseley and Marks)
The Log-Structured Merge-Tree (O'Neil et al.)
In Search of an Understandable Consensus Algorithm (Ongaro and Ousterhout)
Failure Trends in a Large Disk Drive Population (Pinheiro et al.)
Fallacies of Distributed Computing Explained (Rotem-Gal-Oz)
F1 - The Fault-Tolerant Distributed RDBMS Supporting Google’s Ad Business (Shute et al.)
Dapper, A Large Scale Distributed Systems Tracing Infrastructure (Sigelman et al.)
Resident Distributed Datasets: a Fault-Tolerant Abstraction for In-Memory Cluster Computing (Zahari et al.)
The Human Side of Postmortems (Zwieback)
Crew Resource Management: a Positive Change for the Fire Service

Posts

Resilience Engineering: Part I, Part II (Allspaw)
Systems Engineering: a Great Definition (Allspaw)
Chaos Monkey Released Into The Wild (Bennett and Tseitlin)
Some Rules for Engineering and Operations (Black)
Service Level Disagreements Part I, Part II (Black)
Incuriosity Will Kill Your Infrastructure (Crayford)
My Philosophy on Alerting (Ewaschuk)
You Can’t Sacrifice Partition Tolerance (Hale)
Customer Trust (Hamilton)
Observations on Errors, Corrections, & Trust of Dependent Systems (Hamilton)
Game Day Exercises at Stripe: Learning from kill -9 (Hedlund)
Life Beyond Distributed Transactions: An Apostate’s Opinion (Helland)
Notes on Distributed Systems for Young Bloods (Hodges)
The Network is Reliable (Kingsbury)
The Trouble with Clocks (Kingsbury)
Call Me Maybe: Final Thoughts (Kingsbury)
Getting Real About Distributed Systems Reliability (Kreps)
The Log: What every software engineer should know about real-time data's unifying abstraction (Kreps)
Incident Response at Heroku (McGranaghan)
On HTTP Load Testing (Nottingham)
Observability at Twitter (Watson)
Stevey’s Google Platforms Rant (Yegge)

Presentations

Design, Lessons, and Advice from Building Distributed Systems at Google (Dean)
Service Design Best Practices (Hamilton)

Books

The Field Guide To Understanding Human Error (Dekker)
Agile Retrospectives: Making Good Teams Great (Derby et al.)
Better: A Surgeon’s Notes on Performance (Gawande)
The Checklist Manifesto: How to Get Things Right (Gawande)
High Performance Browser Networking (Grigorik)
Resilience Engineering in Practice (Hollnagel et al.)
Effective Monitoring and Alerting (Ligus)
Release It!: Design and Deploy Production-Ready Software (Nygard)
The Challenger Launch Decision (Vaughan)
Managing the Unexpected (Weick and Sutcliffe)

Research Groups

Conferences

License

MIT License & cc license

This work is licensed under a Creative Commons Attribution 4.0 International License.

Back to top

Name		Name	Last commit message	Last commit date
Latest commit History 8 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
code-of-conduct		code-of-conduct
contributing.md		contributing.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Practical Engineering Guidelines

`Table of Contents`

Papers

Posts

Presentations

Books

Research Groups

Conferences

License

About

Releases

Packages

License

exajobs/practical-engineering-collection

Folders and files

Latest commit

History

Repository files navigation

Practical Engineering Guidelines

Table of Contents

Papers

Posts

Presentations

Books

Research Groups

Conferences

License

About

Topics

Resources

License

Code of conduct

Stars

Watchers

Forks

Releases

Packages 0

`Table of Contents`

Packages