I have spent far too much time designing a custom ID type for my current project. I wanted to use it as the primary key in a SQLite database, which imposes some constraints. Specifically, the ID must fit within 63 bits, since SQLite only supports signed integers and I want to avoid negative values. (Technically, negative IDs would work, but they’re not ideal.)
You might be thinking, “Why not just use a BLOB as the primary key? That gives you much more flexibility.” And that’s a fair point, but I am intentionally avoiding that because of how SQLite handles its hidden rowid. When you use an integer as the primary key, SQLite internally aliases it to the rowid, which makes operations significantly faster. Using a BLOB would remove that performance advantage and make the database larger.
So the next step is choosing the bit layout. The first bit is unused to prevent negative values. Then I went with 43 bits for a Unix millisecond timestamp. This gives me 278 years of ranges, should be plenty. Using the default Unix epoch this will work until the year 2248. It will outlive me so that is more than enough.
The remaining 20 bits are random, which gives 1_048_576 possible values. I am using random values because I don’t want to keep track of state (as with an autoincrement), and my current system can handle collisions. It is still possible to swap approaches down the road while keeping the already generated IDs. 1_048_576 Sounds like a lot, but this gives a 1% chance of a collision occurring when only generating 146 IDs. Then again, those IDs would need to be generated within the same millisecond. I am not expecting that much volume.
Bit | 63 (MSB) | 62 ... 20 | 19 ... 0 | some 1JTWRZPBJ4DSE
-----|----------|-----------|----------| example 1JTWRZSPA1NBS
Use | Unused | Timestamp | Random | IDs 1JTWRZTQSE9G6
Size | 1 bit | 43 bits | 20 bits | -> 1JTWRZVY5R7RT
The reason for using a timestamp in the leading bits is to minimize B-tree rebalances. As time advances, the generated IDs grow in sequence, allowing the B-tree to insert new entries without reorganizing older pages. By contrast, a completely random primary key (like a UUID v4) forces the B-tree to rebalance frequently, which can significantly degrade database performance.
Finally, the string representation: I chose Crockford’s Base32 (without the check digit). Just 13 characters to represent a int64. To me it’s practically perfect from a technical standpoint, and I like how the IDs look. I know aesthetics shouldn’t matter, but this is my project. So I set the rules, and I want things to look cool and and have some aesthetic appeal. Looks way better than those stupid UUIDs.
One final note, please store your IDs in a binary presentation (BLOB or integer). It hurts me every time I seed a ID stored as its string representation. It is way slower and waists storage. It mainly happens with UUIDs, most people don’t realize it actually is a binary ID and not a string. Even the spec states it but I guess people just don’t read it.