OTLP Logs on the Wire: AnyValue’s Oneof Tags, Attribute KVLists, and a Zero-Allocation Rust Protobuf Fast-Path

Logs are “just strings” right up until you ship structured logs over OTLP and discover:

  • every attribute is another nested protobuf message,
  • AnyValue is a oneof (branching on the wire),
  • and the cost is dominated by length-delimited blobs (wire type 2) with varint lengths.

This deep dive is about the exact bytes involved when you send OTLP logs (over gRPC or HTTP):

  • protobuf tags ((field_number << 3) | wire_type) and varints
  • the on-the-wire layout of ExportLogsServiceRequest
  • how AnyValue encodes its oneof
  • why KeyValueList exists (and how it hurts)
  • a pointer-based Rust decoder that can “skim” an OTLP payload without allocations

If you can read these bytes, you can explain why:

  • your Collector CPU spikes when someone adds a 2KB JSON blob to body,
  • Loki ingest via OTLP behaves differently from Loki’s native push API,
  • and why “protobuf is fast” becomes untrue once you embed maps-of-maps.

Specs we rely on (manually verified)

These links were fetched and checked for the specific definitions cited.


Layer 0 recap: protobuf records are TLV-ish (tag + payload)

From the protobuf wire format:

  • each field record starts with a tag varint
  • tag = (field_number << 3) | wire_type
  • for wire type LEN (2) the payload is: len_varint + len bytes

So most OTLP payloads look like an endless stream of:

[tag varint][len varint][len bytes]
[tag varint][len varint][len bytes]
...

Mermaid: the byte tape you’re actually parsing

flowchart LR
  subgraph REC["Protobuf record (wire type = LEN)"]
    T["tag varint\n(field<<3 | 2)"] --> L["len varint"] --> P["payload bytes (len)"]
  end

Operational implication: OTLP logs ingest is fundamentally scan + varint + bounds-check. If your implementation copies slices around, you lose.


ExportLogsServiceRequest: why the first byte is usually 0x0A

The OTLP logs service request is:

message ExportLogsServiceRequest {
  repeated opentelemetry.proto.logs.v1.ResourceLogs resource_logs = 1;
}

Field 1, wire type LEN=2 → tag = (1 << 3) | 2 = 0x08 | 0x02 = 0x0A.

So a request containing exactly one ResourceLogs typically begins as:

bytesmeaning
0Atag: field 1, LEN
..varint length of the embedded ResourceLogs message
..ResourceLogs bytes

This is not trivia: if you’re sampling or applying ingress limits, you can detect “payload is OTLP logs” and count nested elements without a full decode.


The real villain: AnyValue (a oneof that turns types into tags)

AnyValue is defined as a oneof:

message AnyValue {
  oneof value {
    string string_value = 1;
    bool bool_value = 2;
    int64 int_value = 3;
    double double_value = 4;
    ArrayValue array_value = 5;
    KeyValueList kvlist_value = 6;
    bytes bytes_value = 7;
    int32 string_value_strindex = 8;
  }
}

On the wire, the oneof is simply “which field number appears”.

Concrete: encoding AnyValue { string_value = "hi" }

  • field = 1
  • wire type = LEN (string)
  • tag = (1<<3)|2 = 0x0A

Bytes:

offsetbyte(s)meaning
00Atag: string_value (field 1, LEN)
102length = 2
2..368 69ASCII h i

That’s the cheap case.

The expensive case: kvlist_value (maps become repeated messages)

kvlist_value is field 6, LEN type → tag = (6<<3)|2 = 0x32.

And inside KeyValueList:

message KeyValueList { repeated KeyValue values = 1; }
message KeyValue { string key = 1; AnyValue value = 2; }

So a single “map entry” becomes (at minimum):

  • one LEN field for the KeyValue message
  • inside it: one LEN field for key
  • and one LEN field for embedded AnyValue

In other words: maps inflate into nested TLV chains.

Mermaid: KeyValue as nested LEN fields

flowchart TB
  KV["KeyValue (embedded message)"] --> K["1: key (LEN string)"]
  KV --> V["2: value (LEN AnyValue)"]
  V --> ONEOF["AnyValue oneof field\n(1..8)"]

LogRecord.flags: bit fields you should actually use

The logs proto defines LogRecord.flags as fixed32 and also defines LogRecordFlags:

  • “Bits 0-7 are used for trace flags.”
  • LOG_RECORD_FLAGS_TRACE_FLAGS_MASK = 0x000000FF

That’s a direct invitation to avoid string parsing and build branch-free routing:

  • if trace-flags bit 0 (“sampled”) is set, keep the log
  • otherwise, drop or downsample

Bit layout (little-endian on the wire, but logically a 32-bit word)

bitsmeaning
0..7trace flags
8..31reserved

A practical gotcha: fixed32 is wire type 5 (I32) and encoded as 4 raw bytes little-endian.

So if you see a tag with wire type I32, your next 4 bytes are the value.


Rust: a “skim decoder” for OTLP logs (count records, sample, extract a few keys)

Sometimes you don’t want to fully decode OTLP logs; you want a fast path to:

  • count LogRecords
  • pull severity_number, time_unix_nano
  • optionally extract body.string_value if present

…and otherwise skip bytes.

Below is a deliberately low-level protobuf reader:

  • pointer-based varint decode
  • skip unknown fields by wire type
  • no allocations
  • no prost structs

Core: varint + tag split

use core::ptr;
 
#[inline(always)]
unsafe fn read_u64_varint(mut p: *const u8, end: *const u8) -> Option<(u64, *const u8)> {
    let mut x: u64 = 0;
    let mut shift = 0;
    while p < end && shift < 70 {
        let b = ptr::read(p);
        p = p.add(1);
        x |= ((b & 0x7f) as u64) << shift;
        if (b & 0x80) == 0 {
            return Some((x, p));
        }
        shift += 7;
    }
    None
}
 
#[inline(always)]
fn split_tag(tag: u64) -> (u32, u8) {
    let wire = (tag & 0x7) as u8;
    let field = (tag >> 3) as u32;
    (field, wire)
}

Skip logic: make unknown fields cheap

#[inline(always)]
unsafe fn skip_field(wire: u8, mut p: *const u8, end: *const u8) -> Option<*const u8> {
    match wire {
        0 => { // VARINT
            let (_v, p2) = read_u64_varint(p, end)?;
            Some(p2)
        }
        1 => { // I64
            if end.offset_from(p) < 8 { return None; }
            Some(p.add(8))
        }
        2 => { // LEN
            let (len, p2) = read_u64_varint(p, end)?;
            let len = len as isize;
            if end.offset_from(p2) < len { return None; }
            Some(p2.offset(len))
        }
        5 => { // I32
            if end.offset_from(p) < 4 { return None; }
            Some(p.add(4))
        }
        _ => None, // groups deprecated; treat as invalid
    }
}

Skim AnyValue for string_value only

This is the trick that keeps your hot path from exploding: if your backend only needs the string body most of the time, you don’t decode arrays/maps.

/// Returns Some(&str bytes) if AnyValue is a string_value, otherwise None.
unsafe fn anyvalue_string<'a>(mut p: *const u8, end: *const u8) -> Option<(&'a [u8], *const u8)> {
    while p < end {
        let (tag, p2) = read_u64_varint(p, end)?;
        p = p2;
        let (field, wire) = split_tag(tag);
 
        // AnyValue.string_value = 1 (LEN)
        if field == 1 && wire == 2 {
            let (len, p3) = read_u64_varint(p, end)?;
            let len = len as isize;
            if end.offset_from(p3) < len { return None; }
            let bytes = core::slice::from_raw_parts(p3, len as usize);
            return Some((bytes, p3.offset(len)));
        }
 
        // skip other oneof arms
        p = skip_field(wire, p, end)?;
    }
    Some((&[], p))
}

This code is intentionally “unsafe and boring” because it matches the wire format precisely.

Where SIMD helps (and where it doesn’t)

  • SIMD can help with varint termination scanning (find first byte where MSB=0), but:
  • OTLP log payloads often have many small varints (1 byte) and many LEN fields where the expensive part is hashing keys and UTF-8 validation, not varint math.

In other words: don’t write SIMD until you’ve proven your hot path is “varint-bound”. It usually isn’t.


Architectural trade-offs: OTLP logs vs “just ship JSON”

OTLP logs (protobuf) trade-offs

  • Pros

    • typed values (int64, double, nested arrays)
    • consistent schema and semantic conventions
    • can be transported over OTLP/gRPC with backpressure
  • Cons

    • maps become nested messages (KeyValueListKeyValueAnyValue)
    • decoding cost is dominated by length-delimited blobs + nested recursion
    • string keys repeat (unless you introduce dictionary-like schemes — not broadly used for logs)

Loki endpoints: native push vs OTLP ingest

Loki’s docs show both:

  • POST /loki/api/v1/push (native Loki push)
  • POST /otlp/v1/logs (OTLP logs ingest)

This matters operationally:

  • Loki-native push has a well-understood “streams + entries” model and can be optimized for Loki’s internal chunking.
  • OTLP logs ingest has to translate from ResourceLogs/ScopeLogs/LogRecord and decode AnyValue (including deep KVLists), which can be CPU-expensive depending on your attribute shape.

Go vs Rust (the uncomfortable truth)

Rust can win on:

  • skipping unknown fields cheaply
  • pointer-based parsing with minimal bounds checks
  • avoiding allocations in the fast path

Go can win because:

  • the protobuf + gRPC stacks are brutally production-hardened
  • CPU profiles often show Go “wasting less time” on string handling due to runtime optimizations and mature libraries
  • the integration costs (load shedding, queues, backpressure, retry) are easier to get correct

I’ve repeatedly seen “a slower decoder” beat “a faster decoder” because it sits inside a better-shaped pipeline.


Provocative conclusion: the structured-logs paradox

Structured logs promise “more query power” because you ship more structure.

But the moment you ship more structure, you pay for:

  • repeated keys,
  • nested TLV re-encoding,
  • and deep AnyValue trees that have to be traversed somewhere.

Research Question:

Can we design an OTLP-compatible log transport that keeps the semantic model but adds dictionary + columnar encoding for attributes (à la Parquet), so “high-cardinality keys” stop dominating CPU?

If we can, why do some Go pipelines still outperform custom Rust decoders — is the limiting factor really parsing, or the emergent behavior of batching, queues, and backpressure under bursty log loads?