uuid now properly supports version 7 counters

The uuid library added support for the newly specified version 7 UUID format in its 1.6.0 release recently. You can use version 7 UUIDs to generate random but sortable identifiers that make great keys in database tables. They combine a Unix timestamp in millisecond precision, a counter of user-specified width, and random data, in that order into a unique identifier. The initial implementation in uuid didn’t use a counter when generating version 7 UUIDs, just the timestamp and random data. That simplified the implementation, but means this code may fail:

let a = Uuid::now_v7();
let b = Uuid::now_v7();

assert!(a < b);

Unless a was generated on an earlier millisecond than b there was no guarantee it would sort before it. As of uuid’s 1.9.0 release, it now is guaranteed.

I’d like to give a big thanks to @sergeyprokhorenko who jumped on the initial thread, explained how counters in version 7 UUIDs should work, and pointed out various reference implementations. Of those, I’d also like to draw attention to uuid7 written by @LiosK. It’s a great and thorough implementation I spent some time digging through that also offers a nice Iterator-based generation API.

Oh, I also optimized Uuid::new_v4() so it’s now much faster. It’s now just rng::u128() & 0xFFFFFFFFFFFF4FFFBFFFFFFFFFFFFFFF | 0x40008000000000000000. It was probably never like that originally because it was written before Rust had 128-bit integers.

How the version 7 counter works

Version 7 UUIDs generated by Uuid::now_v7() are now encoded as follows:

Width (bits)	Description
48	Unix timestamp at millisecond precision
4	`0x7` (UUID version)
1	Counter segment initialized to `0` each millisecond
11	Counter segment initialized with random data each millisecond
2	`0x2` (UUID variant)
30	Counter segment initialized with random data each millisecond
32	Random data

From the table, we see the counter occupies 42 bits, with the remaining 32 bits filled with random data. The counter ensures monotonicity. Whenever a new timestamp value is observed from the system clock, the counter is re-initialized to a random value with the highest bit unset. Each UUID generated within that millisecond will share the same timestamp, but increment the counter. If the counter manages to overflow in that millisecond then it’s treated as a new millisecond. The counter is re-initialized and the timestamp is artificially incremented until the system clock advances past it.

Evolving the `uuid` API

Making version 7 UUIDs support counters in uuid without breaking backwards compatibility, and without introducing a lot of new concepts to the existing API was a bit tricky, but I’m happy with the end result. It’s definitely stretched the original API and isn’t how this would look if I designed it from scratch, but isn’t too bad. I’ve ended up re-using and extending the existing ClockSequence trait previously designed for counters in version 1 UUIDs. It now looks like this:

pub trait ClockSequence {
    type Output;

    fn generate_sequence(&self, seconds: u64, subsec_nanos: u32) -> Self::Output;

    // New method
    fn generate_timestamp_sequence(
        &self,
        seconds: u64,
        subsec_nanos: u32,
    ) -> (Self::Output, u64, u32) {
        (
            self.generate_sequence(seconds, subsec_nanos),
            seconds,
            subsec_nanos,
        )
    }

    // New method
    fn usable_bits(&self) -> usize
    where
        Self::Output: Sized,
    {
        cmp::min(128, core::mem::size_of::<Self::Output>())
    }
}

The key new method is ClockSequence::usable_bits(). Rust doesn’t have a u14 or u42 type, so this method lets an implementation communicate how many bits of Output its counter actually occupies. The rest can be filled with random data when constructing a Uuid.

The ClockSequence::generate_timestamp_sequence() method lets an implementation adjust the timestamp if necessary. This is used to guarantee monotonicity if the counter wraps.

The uuid::Timestamp type has always accepted an impl ClockSequence and a Unix timestamp. It can then be used to construct either a version 1 or version 7 UUID. It used to look like this:

pub struct Timestamp {
    seconds: u64,
    nanos: u32,
    #[cfg(any(feature = "v1", feature = "v6"))]
    counter: u16,
}

impl Timestamp {
    pub fn now(context: impl ClockSequence<Output = u16>) -> Self {
        let (seconds, nanos) = now();

        Timestamp {
            seconds,
            nanos,
            #[cfg(any(feature = "v1", feature = "v6"))]
            counter: context.generate_sequence(seconds, nanos),
        }
    }
}

The u16 counter used for version 1 UUIDs isn’t wide enough for the 42-bit counter we use in version 7 UUIDs so it’s now generalized to impl Into<u128>:

pub struct Timestamp {
    seconds: u64,
    subsec_nanos: u32,
    counter: u128,
    usable_counter_bits: u8,
}

impl Timestamp {
    pub fn now(context: impl ClockSequence<Output = impl Into<u128>>) -> Self {
        let (seconds, subsec_nanos) = now();

        let (counter, seconds, subsec_nanos) = context.generate_timestamp_sequence(seconds, subsec_nanos);
        let counter = counter.into();
        let usable_counter_bits = context.usable_bits() as u8;

        Timestamp {
            seconds,
            subsec_nanos,
            counter,
            usable_counter_bits,
        }
    }
}

The version 7 counter is implemented by a new ContextV7 type to go along with the existing Context type. A static Mutex<ContextV7> is used internally by Uuid::now_v7() to guarantee ordering.

Evolution is hard

My biggest goal with uuid is to keep the API stable. It’s definitely not perfect. I’ve made a few mistakes with it along the way. I haven’t ruled out making a semver-compatible 2.0.0 release sometime in the (not-too-near) future to clean a few things up, but want to avoid doing anything that changes the Uuid type itself. If I could go back again, I think I would try keep Uuid independent of any other types in the library and use top-level functions like uuid::v4() to generate them. Just keep Uuid focused on being pure data. That way, we could evolve a higher-level API around the Uuid type without breaking the Uuid type itself. That’s something I would be recommending to library authors; be conscious of the reach of each type in your API because changing one of them will cascade through the others it touches.

How the version 7 counter works

Evolving the uuid API

Evolution is hard

Evolving the `uuid` API