fix: handle AT Protocol $bytes type in json_to_ipld
did:plc:wydyrngmxbcsqdvhmd7whmye wants to merge
did:plc:wydyrngmxbcsqdvhmd7whmye opened Mar 14, 2026
# fix: handle `$bytes` in `json_to_ipld`
## What broke
`json_to_ipld` knows about `$link` but not `$bytes`. So this:
```json
{ "ciphertext": { "$bytes": "ygoGIpnVb/HQTIZythM9..." } }
```
gets written to CBOR as a map (`{ "$bytes": "..." }`, major type 5) instead of
a raw byte string (major type 2). The
[data model spec](https://atproto.com/specs/data-model#bytes) is says that
`$bytes` is a JSON encoding of raw bytes, not a map.
The PDS doesn't notice; it's consistently wrong in both directions, so
JSON -> CBOR -> JSON round-trips fine internally. The problem shows up downstream when trying to send, for example, encrypted bytes.
## How Jetstream breaks
Jetstream uses indigo's `atdata.UnmarshalCBOR`. Its CBOR decoder reads the
malformed map into `map[string]any`, `parseMap` spots the `$bytes` key, and
routes into
[`parseBytes`](https://github.com/bluesky-social/indigo/blob/main/atproto/atdata/parse.go):
```go
func parseBytes(obj map[string]any) (Bytes, error) {
if len(obj) != 1 {
return nil, fmt.Errorf("$bytes objects must have a single field")
}
v, ok := obj["$bytes"].(string)
if !ok {
return nil, fmt.Errorf("$bytes field missing or not a string")
}
b, err := base64.RawStdEncoding.DecodeString(v)
if err != nil {
return nil, fmt.Errorf("decoding $byte value: %w", err)
}
return Bytes(b), nil
}
```
`RawStdEncoding` in Go does not allow padding. If the base64 has `=` padding,
this blows up with `"decoding $byte value: illegal base64 data at input byte N"`.
Whether the base64 has padding depends on whatever client created the record;
Tranquil wasn't decoding or re-encoding it, just passing the string through
as-is inside the CBOR map.
Because of this, every create event for records with `$bytes` fields gets silently
dropped from Jetstream if the base64 it contains requires padding. Deletes still worked
because Jetstream doesn't read record bytes for those.
This is why, for example with [`app.opake.grant`](https://tangled.org/sans-self.org/opake.app/blob/main/lexicons/app.opake.grant.json)
certain creates weren't showing up on Jetstream while deletes worked fine.
## Why the Node PDS doesn't have this problem
The official TypeScript PDS converts `$bytes` -> `Uint8Array` at the lex layer,
before CBOR serialization ever runs. From
[`@atproto/lex-json`](https://github.com/bluesky-social/atproto/blob/main/packages/lex/lex-json/src/bytes.ts):
```typescript
export function parseLexBytes(
input?: Record<string, unknown>,
): Uint8Array | undefined {
if (!input || !('$bytes' in input)) return undefined
for (const key in input) {
if (key !== '$bytes') return undefined
}
if (typeof input.$bytes !== 'string') return undefined
try {
return fromBase64(input.$bytes)
} catch {
return undefined
}
}
```
[`fromBase64`](https://github.com/bluesky-social/atproto/blob/main/packages/lex/lex-data/src/uint8array-from-base64.ts)
uses `Uint8Array.fromBase64` with `lastChunkHandling: 'loose'` (native) or
dynamically picks padded/unpadded decoding and so both accept `=`.
By the time CBOR serialization runs, the bytes are already a `Uint8Array`, so
the `$bytes` wrapper never leaks through.
## The fix
`$bytes` check in `json_to_ipld`, same pattern as the existing `$link` check you already
did. Single-key object with a string value -> decode from standard base64 -> `Ipld::Bytes`. Padding accepted but not required, per spec.
## Tests
- `test_json_to_ipld_bytes_simple`; base64 -> bytes
- `test_json_to_ipld_bytes_empty`; empty bytes
- `test_json_to_ipld_bytes_with_special_base64_chars`; `+` and `/` in
the base64 (the chars that triggered the original downstream failure)
- `test_json_to_ipld_bytes_unpadded`; padded and unpadded both decode
- `test_json_to_ipld_bytes_produces_cbor_byte_string_not_map`; regression
test: asserts CBOR major type 2, not major type 5
- `test_json_to_ipld_bytes_not_confused_with_extra_keys`; `$bytes` with
sibling keys stays a map (same as `$link` behavior)
- `test_json_to_ipld_bytes_nested_in_record`; opake-style record with nested
`$bytes`, round-tripped through CBOR
## Other issues found
`json_to_ipld` also accepts floats (`Ipld::Float(f)` for non-integer numbers).
The AT Protocol data model
[explicitly bans these](https://atproto.com/specs/data-model). Not this PR's
focus, if you like I'll submit another PR for it :)
Comments (0)
No comments yet.