fix: handle AT Protocol $bytes type in json_to_ipld

did:plc:wydyrngmxbcsqdvhmd7whmye wants to merge
did:plc:wydyrngmxbcsqdvhmd7whmye opened Mar 14, 2026
# fix: handle `$bytes` in `json_to_ipld` ## What broke `json_to_ipld` knows about `$link` but not `$bytes`. So this: ```json { "ciphertext": { "$bytes": "ygoGIpnVb/HQTIZythM9..." } } ``` gets written to CBOR as a map (`{ "$bytes": "..." }`, major type 5) instead of a raw byte string (major type 2). The [data model spec](https://atproto.com/specs/data-model#bytes) is says that `$bytes` is a JSON encoding of raw bytes, not a map. The PDS doesn't notice; it's consistently wrong in both directions, so JSON -> CBOR -> JSON round-trips fine internally. The problem shows up downstream when trying to send, for example, encrypted bytes. ## How Jetstream breaks Jetstream uses indigo's `atdata.UnmarshalCBOR`. Its CBOR decoder reads the malformed map into `map[string]any`, `parseMap` spots the `$bytes` key, and routes into [`parseBytes`](https://github.com/bluesky-social/indigo/blob/main/atproto/atdata/parse.go): ```go func parseBytes(obj map[string]any) (Bytes, error) { if len(obj) != 1 { return nil, fmt.Errorf("$bytes objects must have a single field") } v, ok := obj["$bytes"].(string) if !ok { return nil, fmt.Errorf("$bytes field missing or not a string") } b, err := base64.RawStdEncoding.DecodeString(v) if err != nil { return nil, fmt.Errorf("decoding $byte value: %w", err) } return Bytes(b), nil } ``` `RawStdEncoding` in Go does not allow padding. If the base64 has `=` padding, this blows up with `"decoding $byte value: illegal base64 data at input byte N"`. Whether the base64 has padding depends on whatever client created the record; Tranquil wasn't decoding or re-encoding it, just passing the string through as-is inside the CBOR map. Because of this, every create event for records with `$bytes` fields gets silently dropped from Jetstream if the base64 it contains requires padding. Deletes still worked because Jetstream doesn't read record bytes for those. This is why, for example with [`app.opake.grant`](https://tangled.org/sans-self.org/opake.app/blob/main/lexicons/app.opake.grant.json) certain creates weren't showing up on Jetstream while deletes worked fine. ## Why the Node PDS doesn't have this problem The official TypeScript PDS converts `$bytes` -> `Uint8Array` at the lex layer, before CBOR serialization ever runs. From [`@atproto/lex-json`](https://github.com/bluesky-social/atproto/blob/main/packages/lex/lex-json/src/bytes.ts): ```typescript export function parseLexBytes( input?: Record<string, unknown>, ): Uint8Array | undefined { if (!input || !('$bytes' in input)) return undefined for (const key in input) { if (key !== '$bytes') return undefined } if (typeof input.$bytes !== 'string') return undefined try { return fromBase64(input.$bytes) } catch { return undefined } } ``` [`fromBase64`](https://github.com/bluesky-social/atproto/blob/main/packages/lex/lex-data/src/uint8array-from-base64.ts) uses `Uint8Array.fromBase64` with `lastChunkHandling: 'loose'` (native) or dynamically picks padded/unpadded decoding and so both accept `=`. By the time CBOR serialization runs, the bytes are already a `Uint8Array`, so the `$bytes` wrapper never leaks through. ## The fix `$bytes` check in `json_to_ipld`, same pattern as the existing `$link` check you already did. Single-key object with a string value -> decode from standard base64 -> `Ipld::Bytes`. Padding accepted but not required, per spec. ## Tests - `test_json_to_ipld_bytes_simple`; base64 -> bytes - `test_json_to_ipld_bytes_empty`; empty bytes - `test_json_to_ipld_bytes_with_special_base64_chars`; `+` and `/` in the base64 (the chars that triggered the original downstream failure) - `test_json_to_ipld_bytes_unpadded`; padded and unpadded both decode - `test_json_to_ipld_bytes_produces_cbor_byte_string_not_map`; regression test: asserts CBOR major type 2, not major type 5 - `test_json_to_ipld_bytes_not_confused_with_extra_keys`; `$bytes` with sibling keys stays a map (same as `$link` behavior) - `test_json_to_ipld_bytes_nested_in_record`; opake-style record with nested `$bytes`, round-tripped through CBOR ## Other issues found `json_to_ipld` also accepts floats (`Ipld::Float(f)` for non-integer numbers). The AT Protocol data model [explicitly bans these](https://atproto.com/specs/data-model). Not this PR's focus, if you like I'll submit another PR for it :)

Comments (0)

No comments yet.

cospan · schematic version control on atproto built on AT Protocol