some facets have incorrect offsets

did:plc:oio4hkxaop4ao4wz2pp3f4cr opened this Nov 15, 2025 2 comments

did:plc:oio4hkxaop4ao4wz2pp3f4cr opened Nov 15, 2025

Some facet offsets are off - likely because of non-ASCII characters, which encode as more than one byte in UTF-8. The facet offsets are counted in bytes, so they need to be taken from a byte array representation of the (UTF-8 encoded) string.

Example block:

There are two relevant indexes in that table: one on `(repo, time)` (repo = user's DID), and one on just `(time)`. Roughly speaking, for those users who follow e.g. 80 or 200 people, it makes more sense to scan the `(repo, time)` index those 80-200 times and collect the 100 most recent posts from all of those found, and for those who follow e.g. 9000 (yes, that happens 😛), it's faster to scan the single `(time)` index until you find 100 relevant posts. But I've been struggling to make Postgres always use the right index.

The first 3 code sections look ok, but the last one renders as "single (time)", probably because it's after the "😛" emoji which makes any characters after it have different byte offsets than character offsets.

No activity yet.

Back to issues