Time, Clocks, and the Ordering of Events in a Distributed System

Notes / Computer Science History / Time, Clocks, and the Ordering of Events in a Distributed System

Author(s): Leslie Lamport

Date: 1978

Notes

  • “A distributed system consists of a collection of distinct processes which are spatially separated, and which communicate with one another by exchanging messages.”
  • Clocks aren’t perfectly accurate and aligning real clocks is difficult.
  • How do we know the order of two events?
  • We can’t rely on ordering based on the clocks of the individual systems
  • Partial Ordering
    • A simple “happens before” relationship to determine the order between two events within the same process (state machines)
    • When two events cross in the event stream, we can’t really know which one came first. For partial ordering, it’s arbitrary
  • Logical Clocks
    • Sending a timestamp along with the message, the receiver can use the timestamp and increment it to a greater value, setting its own local clock
    • In a distributed system, each process keeps its own clock
  • Total Ordering
    • Use logical clocks to order events and then use an arbitrary way to determine which came first for concurrent events
    • Can be used to solve the mutual exlusion problem. Only one process can use a resource at a time so the processes must
    • Each process maintains its own request queue which is never seen by another process.
    • Defines the following algorithm: 1. Process P1 sends a message to all other processes to request a resource with a timestamp on the message 2. All other processes acknowledge the request on their request queue, and send a timestamped acknowledgement message to P1 3. P1 releases the resource by removing the requests resource message from its request queue and sends a timestamped releases resource message to all other processes 4. All other processes receive that releases resource message and removes the requests resource message from their request queues
      • Process P1 gets the resource when there is a requests resource message in its request queue, ordered before any other request in its queue, and P1 has received a message from every other process acknowledging the request after its initial timestamp
      • This algorithm is ideal in situations where there is no central process or central storage
  • Physical Clocks
    • A clock synchronizing algorithm to reduce drift among physical clocks using the current value of the process physical clock and the timestamp of the message received
  • Drawbacks
    • Fails to call out retries or errors/failing scenarios
  • “Almost nobody thought it was about state machines”…“Never understood why it’s so important in computer science… but what people get out of it is a way of thinking about distributed systems that felt obvious to me but wasn’t obvious…and got them thinking a different way”. - Interview with Leslie Lamport
    • Thinks about this as a physics problem, not a math problem.

Resources