uuid now properly supports version 7 counters
The uuid
library added support for the newly specified version 7 UUID format in its 1.6.0
release recently. You can use version 7 UUIDs to generate random but sortable identifiers that make great keys in database tables. They combine a Unix timestamp in millisecond precision, a counter of user-specified width, and random data, in that order into a unique identifier. The initial implementation in uuid
didn’t use a counter when generating version 7 UUIDs, just the timestamp and random data. That simplified the implementation, but means this code may fail:
let a = Uuid::now_v7();
let b = Uuid::now_v7();
assert!(a < b);
Unless a
was generated on an earlier millisecond than b
there was no guarantee it would sort before it. As of uuid
’s 1.9.0
release, it now is guaranteed.
I’d like to give a big thanks to @sergeyprokhorenko who jumped on the initial thread, explained how counters in version 7 UUIDs should work, and pointed out various reference implementations. Of those, I’d also like to draw attention to uuid7
written by @LiosK. It’s a great and thorough implementation I spent some time digging through that also offers a nice Iterator
-based generation API.
Oh, I also optimized Uuid::new_v4()
so it’s now much faster. It’s now just rng::u128() & 0xFFFFFFFFFFFF4FFFBFFFFFFFFFFFFFFF | 0x40008000000000000000
. It was probably never like that originally because it was written before Rust had 128-bit integers.
How the version 7 counter works
Version 7 UUIDs generated by Uuid::now_v7()
are now encoded as follows:
Width (bits) | Description |
---|---|
48 | Unix timestamp at millisecond precision |
4 | 0x7 (UUID version) |
1 | Counter segment initialized to 0 each millisecond |
11 | Counter segment initialized with random data each millisecond |
2 | 0x2 (UUID variant) |
30 | Counter segment initialized with random data each millisecond |
32 | Random data |
From the table, we see the counter occupies 42 bits, with the remaining 32 bits filled with random data. The counter ensures monotonicity. Whenever a new timestamp value is observed from the system clock, the counter is re-initialized to a random value with the highest bit unset. Each UUID generated within that millisecond will share the same timestamp, but increment the counter. If the counter manages to overflow in that millisecond then it’s treated as a new millisecond. The counter is re-initialized and the timestamp is artificially incremented until the system clock advances past it.
Evolving the uuid
API
Making version 7 UUIDs support counters in uuid
without breaking backwards compatibility, and without introducing a lot of new concepts to the existing API was a bit tricky, but I’m happy with the end result. It’s definitely stretched the original API and isn’t how this would look if I designed it from scratch, but isn’t too bad. I’ve ended up re-using and extending the existing ClockSequence
trait previously designed for counters in version 1 UUIDs. It now looks like this:
pub trait ClockSequence {
type Output;
fn generate_sequence(&self, seconds: u64, subsec_nanos: u32) -> Self::Output;
// New method
fn generate_timestamp_sequence(
&self,
seconds: u64,
subsec_nanos: u32,
) -> (Self::Output, u64, u32) {
(
self.generate_sequence(seconds, subsec_nanos),
seconds,
subsec_nanos,
)
}
// New method
fn usable_bits(&self) -> usize
where
Self::Output: Sized,
{
cmp::min(128, core::mem::size_of::<Self::Output>())
}
}
The key new method is ClockSequence::usable_bits()
. Rust doesn’t have a u14
or u42
type, so this method lets an implementation communicate how many bits of Output
its counter actually occupies. The rest can be filled with random data when constructing a Uuid
.
The ClockSequence::generate_timestamp_sequence()
method lets an implementation adjust the timestamp if necessary. This is used to guarantee monotonicity if the counter wraps.
The uuid::Timestamp
type has always accepted an impl ClockSequence
and a Unix timestamp. It can then be used to construct either a version 1 or version 7 UUID. It used to look like this:
pub struct Timestamp {
seconds: u64,
nanos: u32,
#[cfg(any(feature = "v1", feature = "v6"))]
counter: u16,
}
impl Timestamp {
pub fn now(context: impl ClockSequence<Output = u16>) -> Self {
let (seconds, nanos) = now();
Timestamp {
seconds,
nanos,
#[cfg(any(feature = "v1", feature = "v6"))]
counter: context.generate_sequence(seconds, nanos),
}
}
}
The u16
counter used for version 1 UUIDs isn’t wide enough for the 42-bit counter we use in version 7 UUIDs so it’s now generalized to impl Into<u128>
:
pub struct Timestamp {
seconds: u64,
subsec_nanos: u32,
counter: u128,
usable_counter_bits: u8,
}
impl Timestamp {
pub fn now(context: impl ClockSequence<Output = impl Into<u128>>) -> Self {
let (seconds, subsec_nanos) = now();
let (counter, seconds, subsec_nanos) = context.generate_timestamp_sequence(seconds, subsec_nanos);
let counter = counter.into();
let usable_counter_bits = context.usable_bits() as u8;
Timestamp {
seconds,
subsec_nanos,
counter,
usable_counter_bits,
}
}
}
The version 7 counter is implemented by a new ContextV7
type to go along with the existing Context
type. A static Mutex<ContextV7>
is used internally by Uuid::now_v7()
to guarantee ordering.
Evolution is hard
My biggest goal with uuid
is to keep the API stable. It’s definitely not perfect. I’ve made a few mistakes with it along the way. I haven’t ruled out making a semver-compatible 2.0.0
release sometime in the (not-too-near) future to clean a few things up, but want to avoid doing anything that changes the Uuid
type itself. If I could go back again, I think I would try keep Uuid
independent of any other types in the library and use top-level functions like uuid::v4()
to generate them. Just keep Uuid
focused on being pure data. That way, we could evolve a higher-level API around the Uuid
type without breaking the Uuid
type itself. That’s something I would be recommending to library authors; be conscious of the reach of each type in your API because changing one of them will cascade through the others it touches.