OTLP Logs on the Wire: AnyValue’s Oneof Tags, Attribute KVLists, and a Zero-Allocation Rust Protobuf Fast-Path
Logs are “just strings” right up until you ship structured logs over OTLP and discover:
- every attribute is another nested protobuf message,
AnyValueis aoneof(branching on the wire),- and the cost is dominated by length-delimited blobs (wire type 2) with varint lengths.
This deep dive is about the exact bytes involved when you send OTLP logs (over gRPC or HTTP):
- protobuf tags (
(field_number << 3) | wire_type) and varints - the on-the-wire layout of
ExportLogsServiceRequest - how
AnyValueencodes itsoneof - why
KeyValueListexists (and how it hurts) - a pointer-based Rust decoder that can “skim” an OTLP payload without allocations
If you can read these bytes, you can explain why:
- your Collector CPU spikes when someone adds a 2KB JSON blob to
body, - Loki ingest via OTLP behaves differently from Loki’s native push API,
- and why “protobuf is fast” becomes untrue once you embed maps-of-maps.
Specs we rely on (manually verified)
These links were fetched and checked for the specific definitions cited.
- OTLP spec: OTLP uses Protocol Buffers schema and is implemented over gRPC/HTTP.
- Protobuf encoding: varints; tag is
(field_number << 3) | wire_type; wire types include LEN=2. - OTLP logs service request:
ExportLogsServiceRequest { repeated ResourceLogs resource_logs = 1; } - OTLP logs data model:
LogsData,ResourceLogs,ScopeLogs, andLogRecord. AnyValue,KeyValue,KeyValueListdefinitions.- Loki HTTP API docs: Loki exposes both native push and an OTLP logs ingest endpoint (
POST /otlp/v1/logs).
Layer 0 recap: protobuf records are TLV-ish (tag + payload)
From the protobuf wire format:
- each field record starts with a tag varint
tag = (field_number << 3) | wire_type- for wire type LEN (2) the payload is:
len_varint+len bytes
So most OTLP payloads look like an endless stream of:
[tag varint][len varint][len bytes]
[tag varint][len varint][len bytes]
...
Mermaid: the byte tape you’re actually parsing
flowchart LR subgraph REC["Protobuf record (wire type = LEN)"] T["tag varint\n(field<<3 | 2)"] --> L["len varint"] --> P["payload bytes (len)"] end
Operational implication: OTLP logs ingest is fundamentally scan + varint + bounds-check. If your implementation copies slices around, you lose.
ExportLogsServiceRequest: why the first byte is usually 0x0A
The OTLP logs service request is:
message ExportLogsServiceRequest {
repeated opentelemetry.proto.logs.v1.ResourceLogs resource_logs = 1;
}Field 1, wire type LEN=2 → tag = (1 << 3) | 2 = 0x08 | 0x02 = 0x0A.
So a request containing exactly one ResourceLogs typically begins as:
| bytes | meaning |
|---|---|
0A | tag: field 1, LEN |
.. | varint length of the embedded ResourceLogs message |
.. | ResourceLogs bytes |
This is not trivia: if you’re sampling or applying ingress limits, you can detect “payload is OTLP logs” and count nested elements without a full decode.
The real villain: AnyValue (a oneof that turns types into tags)
AnyValue is defined as a oneof:
message AnyValue {
oneof value {
string string_value = 1;
bool bool_value = 2;
int64 int_value = 3;
double double_value = 4;
ArrayValue array_value = 5;
KeyValueList kvlist_value = 6;
bytes bytes_value = 7;
int32 string_value_strindex = 8;
}
}On the wire, the oneof is simply “which field number appears”.
Concrete: encoding AnyValue { string_value = "hi" }
- field = 1
- wire type = LEN (string)
- tag =
(1<<3)|2 = 0x0A
Bytes:
| offset | byte(s) | meaning |
|---|---|---|
| 0 | 0A | tag: string_value (field 1, LEN) |
| 1 | 02 | length = 2 |
| 2..3 | 68 69 | ASCII h i |
That’s the cheap case.
The expensive case: kvlist_value (maps become repeated messages)
kvlist_value is field 6, LEN type → tag = (6<<3)|2 = 0x32.
And inside KeyValueList:
message KeyValueList { repeated KeyValue values = 1; }
message KeyValue { string key = 1; AnyValue value = 2; }So a single “map entry” becomes (at minimum):
- one LEN field for the
KeyValuemessage - inside it: one LEN field for
key - and one LEN field for embedded
AnyValue
In other words: maps inflate into nested TLV chains.
Mermaid: KeyValue as nested LEN fields
flowchart TB KV["KeyValue (embedded message)"] --> K["1: key (LEN string)"] KV --> V["2: value (LEN AnyValue)"] V --> ONEOF["AnyValue oneof field\n(1..8)"]
LogRecord.flags: bit fields you should actually use
The logs proto defines LogRecord.flags as fixed32 and also defines LogRecordFlags:
- “Bits 0-7 are used for trace flags.”
LOG_RECORD_FLAGS_TRACE_FLAGS_MASK = 0x000000FF
That’s a direct invitation to avoid string parsing and build branch-free routing:
- if trace-flags bit 0 (“sampled”) is set, keep the log
- otherwise, drop or downsample
Bit layout (little-endian on the wire, but logically a 32-bit word)
| bits | meaning |
|---|---|
| 0..7 | trace flags |
| 8..31 | reserved |
A practical gotcha: fixed32 is wire type 5 (I32) and encoded as 4 raw bytes little-endian.
So if you see a tag with wire type I32, your next 4 bytes are the value.
Rust: a “skim decoder” for OTLP logs (count records, sample, extract a few keys)
Sometimes you don’t want to fully decode OTLP logs; you want a fast path to:
- count
LogRecords - pull
severity_number,time_unix_nano - optionally extract
body.string_valueif present
…and otherwise skip bytes.
Below is a deliberately low-level protobuf reader:
- pointer-based varint decode
- skip unknown fields by wire type
- no allocations
- no
proststructs
Core: varint + tag split
use core::ptr;
#[inline(always)]
unsafe fn read_u64_varint(mut p: *const u8, end: *const u8) -> Option<(u64, *const u8)> {
let mut x: u64 = 0;
let mut shift = 0;
while p < end && shift < 70 {
let b = ptr::read(p);
p = p.add(1);
x |= ((b & 0x7f) as u64) << shift;
if (b & 0x80) == 0 {
return Some((x, p));
}
shift += 7;
}
None
}
#[inline(always)]
fn split_tag(tag: u64) -> (u32, u8) {
let wire = (tag & 0x7) as u8;
let field = (tag >> 3) as u32;
(field, wire)
}Skip logic: make unknown fields cheap
#[inline(always)]
unsafe fn skip_field(wire: u8, mut p: *const u8, end: *const u8) -> Option<*const u8> {
match wire {
0 => { // VARINT
let (_v, p2) = read_u64_varint(p, end)?;
Some(p2)
}
1 => { // I64
if end.offset_from(p) < 8 { return None; }
Some(p.add(8))
}
2 => { // LEN
let (len, p2) = read_u64_varint(p, end)?;
let len = len as isize;
if end.offset_from(p2) < len { return None; }
Some(p2.offset(len))
}
5 => { // I32
if end.offset_from(p) < 4 { return None; }
Some(p.add(4))
}
_ => None, // groups deprecated; treat as invalid
}
}Skim AnyValue for string_value only
This is the trick that keeps your hot path from exploding: if your backend only needs the string body most of the time, you don’t decode arrays/maps.
/// Returns Some(&str bytes) if AnyValue is a string_value, otherwise None.
unsafe fn anyvalue_string<'a>(mut p: *const u8, end: *const u8) -> Option<(&'a [u8], *const u8)> {
while p < end {
let (tag, p2) = read_u64_varint(p, end)?;
p = p2;
let (field, wire) = split_tag(tag);
// AnyValue.string_value = 1 (LEN)
if field == 1 && wire == 2 {
let (len, p3) = read_u64_varint(p, end)?;
let len = len as isize;
if end.offset_from(p3) < len { return None; }
let bytes = core::slice::from_raw_parts(p3, len as usize);
return Some((bytes, p3.offset(len)));
}
// skip other oneof arms
p = skip_field(wire, p, end)?;
}
Some((&[], p))
}This code is intentionally “unsafe and boring” because it matches the wire format precisely.
Where SIMD helps (and where it doesn’t)
- SIMD can help with varint termination scanning (find first byte where MSB=0), but:
- OTLP log payloads often have many small varints (1 byte) and many LEN fields where the expensive part is hashing keys and UTF-8 validation, not varint math.
In other words: don’t write SIMD until you’ve proven your hot path is “varint-bound”. It usually isn’t.
Architectural trade-offs: OTLP logs vs “just ship JSON”
OTLP logs (protobuf) trade-offs
-
Pros
- typed values (
int64,double, nested arrays) - consistent schema and semantic conventions
- can be transported over OTLP/gRPC with backpressure
- typed values (
-
Cons
- maps become nested messages (
KeyValueList→KeyValue→AnyValue) - decoding cost is dominated by length-delimited blobs + nested recursion
- string keys repeat (unless you introduce dictionary-like schemes — not broadly used for logs)
- maps become nested messages (
Loki endpoints: native push vs OTLP ingest
Loki’s docs show both:
POST /loki/api/v1/push(native Loki push)POST /otlp/v1/logs(OTLP logs ingest)
This matters operationally:
- Loki-native push has a well-understood “streams + entries” model and can be optimized for Loki’s internal chunking.
- OTLP logs ingest has to translate from
ResourceLogs/ScopeLogs/LogRecordand decodeAnyValue(including deep KVLists), which can be CPU-expensive depending on your attribute shape.
Go vs Rust (the uncomfortable truth)
Rust can win on:
- skipping unknown fields cheaply
- pointer-based parsing with minimal bounds checks
- avoiding allocations in the fast path
Go can win because:
- the protobuf + gRPC stacks are brutally production-hardened
- CPU profiles often show Go “wasting less time” on string handling due to runtime optimizations and mature libraries
- the integration costs (load shedding, queues, backpressure, retry) are easier to get correct
I’ve repeatedly seen “a slower decoder” beat “a faster decoder” because it sits inside a better-shaped pipeline.
Provocative conclusion: the structured-logs paradox
Structured logs promise “more query power” because you ship more structure.
But the moment you ship more structure, you pay for:
- repeated keys,
- nested TLV re-encoding,
- and deep
AnyValuetrees that have to be traversed somewhere.
Research Question:
Can we design an OTLP-compatible log transport that keeps the semantic model but adds dictionary + columnar encoding for attributes (à la Parquet), so “high-cardinality keys” stop dominating CPU?
If we can, why do some Go pipelines still outperform custom Rust decoders — is the limiting factor really parsing, or the emergent behavior of batching, queues, and backpressure under bursty log loads?