Reverse-engineering a binary protocol: protobuf, MessagePack, and the wire format

You intercept a request from some app’s backend and the body is a smear of bytes. No JSON, no XML, no field names. Just 08 96 01 12 04 ... and a content-type that says application/grpc or application/x-protobuf or nothing useful at all. You have no .proto file, no schema, no documentation. The natural assumption is that you are stuck, that the bytes are opaque until you find the schema. That assumption is wrong, and the reason it is wrong is the most useful thing to understand about binary protocols.

Most of these formats leak their own structure. Protobuf tells you the field number and the broad shape of every value even when it refuses to tell you the name or the meaning. MessagePack puts a type tag in front of every single value. gRPC wraps each message in a fixed five-byte header you can read with a ruler. The schema is missing, but the skeleton is right there in the bytes, and once you can see the skeleton you can recover most of the meaning by reading values and matching them against what the app does. This post is about how to read that skeleton, grounded in the actual specifications rather than folklore.

The plan: I start with the protobuf wire format in detail, because it is the one that looks scariest and rewards understanding the most. Varints, the field tag, the six wire types, ZigZag, length-delimited records, and the specific ambiguity that makes schema-less decoding a guessing game rather than a clean inversion. Then MessagePack, which takes the opposite design decision and is self-describing. Then gRPC framing on top of HTTP/2, since that is how most modern protobuf reaches you. Then the tooling that automates the reading: protoscope, blackboxprotobuf, pbtk, and the interactive editors built into mitmproxy. I close with the part nobody writes down, which is how you turn a decoded skeleton into actual field meanings.

The protobuf wire format is simpler than its reputation

Protocol Buffers, Google’s serialization format, has a reputation for being impenetrable without the schema. The wire format itself is small. The official encoding guide fits the whole thing on one page, and the core idea is that a message on the wire is a flat sequence of records, each record being a key followed by a value. The key carries a field number and a wire type. The value’s interpretation depends on that wire type. There is no message length at the front, no field names anywhere, no type names. The decoder is expected to bring its own schema to map field numbers to names and declared types.

That last sentence is the whole reason schema-less decoding is possible at all. The format separates structure from meaning. The structure (which field number, which wire type, where the value ends) is fully self-describing on the wire. Only the meaning (what field 3 is called, whether it is an int32 or an enum or a bool) lives in the schema. Strip the schema and you lose names and exact types. You do not lose the shape.

Everything starts with the varint, the variable-length integer encoding that protobuf uses for tags, lengths, and most numeric values.

Varints, the continuation bit, and little-endian groups of seven

A varint stores an integer in one to ten bytes. Each byte carries seven bits of the actual number in its low seven bits. The high bit, the most significant bit of the byte, is a continuation flag: when it is set, another byte follows; when it is clear, this is the last byte of the varint. The seven-bit groups are stored least significant group first, which is to say little-endian at the group level.

The canonical example from the spec is the number 150. It encodes as the two bytes 96 01. Reading that back: take 96 01, which in bits is 1001 0110 0000 0001. Drop the most significant bit of each byte to get the payload groups 001 0110 and 000 0001. Reverse the group order because varints are little-endian, giving 000 0001 then 001 0110, concatenate to 000 0001 001 0110, and that binary number is 150. The continuation bit on the first byte was set, telling you to read the second; the second’s was clear, telling you to stop.

*How the two bytes 0x96 0x01 decode to 150: the high bit of each byte is a continuation flag, the remaining seven-bit groups are concatenated in reverse (little-endian) order.*

Two practical consequences fall out of this. First, small numbers are cheap and large numbers are expensive, so protobuf encoders want their most common values to be small, which is why field numbers 1 through 15 are precious (they fit the tag in a single byte). Second, a negative number stored as a plain int32 or int64 is always encoded as if it were a very large unsigned value, which means it always takes the full ten bytes. That waste is exactly what ZigZag fixes, and I will come back to it.

The field tag: field number and wire type in one varint

Every record begins with a tag, and the tag is itself a varint. It packs two things: the field number and the wire type. The formula from the spec is (field_number << 3) | wire_type. The low three bits are the wire type; everything above them is the field number.

Take the byte 08, the first byte of nearly every protobuf message you will ever stare at. In binary that is 0000 1000. The continuation bit is clear, so the whole tag is one byte. The low three bits are 000, wire type 0. The remaining bits, 00001, are field number 1. So 08 means “field 1, wire type VARINT,” and the bytes that follow are a varint value. Combine that with the earlier example and 08 96 01 is the complete encoding of a message where field 1 holds the value 150. Three bytes, fully decoded, no schema required to see the structure.

*The tag byte splits into a wire type (low three bits, orange) and a field number (the rest). 0x08 is field 1, wire type VARINT.*

The six wire types and what each one tells the decoder

There are six wire types, and only four of them matter in practice because two are deprecated groups. The encoding guide lists them by their numeric value, which is what you read out of the low three bits of every tag.

Wire type 0 is VARINT. The value is a single varint, used for int32, int64, uint32, uint64, sint32, sint64, bool, and enum. Wire type 1 is I64, a fixed eight bytes, used for fixed64, sfixed64, and double. Wire type 2 is LEN, the length-delimited type, where a varint length immediately follows the tag and that many bytes of payload follow the length; this carries string, bytes, embedded messages, and packed repeated fields. Wire type 5 is I32, a fixed four bytes, used for fixed32, sfixed32, and float. Wire types 3 and 4 are SGROUP and EGROUP, the start and end markers of the deprecated group feature, which you will almost never see in modern traffic.

The decoder needs only the wire type to know where a record ends. VARINT ends at the first byte with a clear continuation bit. I64 and I32 are always eight and four bytes. LEN reads its length prefix and skips exactly that many bytes. That is why you can walk an entire protobuf message you have never seen before, record by record, and never get lost: the wire type at each tag tells you precisely how far to advance. You can do it with a pencil.

*The six wire types. Four are live; the two group types are deprecated. The wire type alone is enough to find every record boundary without a schema.*

ZigZag and the negative-number problem

Plain varints waste space on negative numbers. A -1 stored as a two’s-complement 64-bit value is all ones, which encodes as the maximum varint, ten bytes. Protobuf offers the sint32 and sint64 types to fix this, and they use a transform called ZigZag before varint encoding.

ZigZag maps signed integers to unsigned ones so that small magnitudes, positive or negative, become small unsigned numbers. The encoding guide states the rule directly: positive integers p become 2 * p, the even numbers, and negative integers n become 2 * |n| - 1, the odd numbers. So 0 maps to 0, -1 to 1, 1 to 2, -2 to 3, 2 to 4, and so on, zig-zagging across the number line, which is where the name comes from. Implemented at the bit level it is (n << 1) ^ (n >> 31) for 32-bit and (n << 1) ^ (n >> 63) for 64-bit, where the shift is arithmetic.

This matters for schema-less decoding in a subtle way. When you see a varint field and you do not know whether the original type was int32 or sint32, the raw decoded number is ambiguous. If the field was ZigZag-encoded, the raw varint 1 actually meant -1, and 2 meant 1. A decoder that does not know the type will show you the raw value, and you have to recognize the ZigZag pattern (small alternating values, no huge ten-byte varints for what should be small negatives) to guess the field used sint. This is one of several places where the missing schema costs you certainty and you fall back on pattern recognition.

Length-delimited records and the ambiguity at the heart of it

Wire type 2, LEN, is where schema-less decoding stops being mechanical and starts being interpretive. The format is clean: tag, then a varint giving the byte length, then exactly that many bytes of payload. The spec notes the length is a varint immediately after the tag, followed by the payload, and that strings cap out at 2 GB because the length is a varint. Walking past a LEN record is trivial; you read the length and skip.

The problem is what the payload is. The same wire type 2 carries four genuinely different things: a UTF-8 string, an arbitrary byte blob, a nested protobuf message, and a packed repeated field of primitives. The wire format gives you no flag to distinguish them. A nested message is just bytes that happen to be a valid protobuf message; a packed repeated field of varints is just bytes that happen to be a sequence of varints; a string is just bytes that happen to be valid UTF-8. They overlap. A short ASCII string can parse as a (nonsensical) protobuf message. A nested message can parse as a string if its bytes are printable.

*A length-delimited field can be a string, raw bytes, a nested message, or a packed array. The wire format carries no flag to tell them apart, which is the central ambiguity of schema-less protobuf.*

The usual heuristic, which is exactly what the automated tools implement, is to try to parse the payload as a nested message first. If every byte is consumed cleanly as a sequence of valid tag-plus-value records, treat it as a nested message. Otherwise, if the bytes are valid printable UTF-8, treat it as a string. Otherwise call it raw bytes. The heuristic is good and wrong often enough to matter. A four-byte string of the right shape parses as a tiny nested message; a nested message whose bytes are all printable shows up as a string. Packed repeated fields are the worst case, because they look like neither and the tools generally cannot detect them without help. The protobuf docs themselves note that packed fields lack the per-element tags that would otherwise mark structure.

Packed repeated fields deserve a sentence on their own. Since Edition 2023, repeated fields of primitive types are packed by default, meaning all the elements share one LEN record with the values concatenated rather than each getting its own tag. That is efficient on the wire and miserable to decode blind, because a packed array of small varints is byte-for-byte indistinguishable from some other field’s nested-message or string payload. You only know it is a packed array because the numbers make sense when you read them that way and the app’s behavior confirms it.

What “non-canonical” encodings buy an attacker and a defender

One more property matters if you are on the defensive side of this. The protobuf wire format is not canonical. The same logical message has many valid byte encodings. Fields can appear in any order. A field can appear more than once, and for scalars the last one wins, while for repeated and LEN types the semantics differ. Varints can be written with trailing zero groups, so 150 could in principle be padded out. Different lengths of varint can encode the same small number.

For someone building a parser or a security control, this is a trap. Two byte sequences that a strict comparison treats as different are the same message to the protobuf library, and two byte sequences a naive normalizer treats as identical can decode to different messages if a duplicate field flips a value. Anti-tamper and signature schemes that hash the raw protobuf bytes rather than a canonicalized form are fragile for exactly this reason. If you are reasoning about how an app validates or signs its payloads, the non-canonical nature of the format is one of the first things worth probing, in the same spirit as the parser-disagreement bugs behind HTTP request smuggling, where two implementations read the same bytes differently.

MessagePack made the opposite choice: self-describing bytes

Protobuf strips type information off the wire and expects a schema to put it back. MessagePack does the reverse. Every value carries its own type tag in the first byte, so a MessagePack blob is fully decodable into a typed structure with no schema at all. You lose field names (a map’s keys are whatever the encoder put there, often short strings or integers), but you never lose types, and you never have the LEN ambiguity that plagues protobuf, because a string, an array, a map, and a binary blob each have distinct prefix bytes.

The MessagePack specification lays out the format-byte map precisely. The cleverness is in how it packs small common values. A positive integer from 0 to 127 is stored as a single byte in the range 0x00 to 0x7f, because the format byte 0xxxxxxx is the value itself; this is positive fixint. Small negative integers from -32 to -1 are a single byte 0xe0 to 0xff, the negative fixint family with format 111yyyyy. A short string up to 31 bytes uses a fixstr prefix in 0xa0 to 0xbf, format 101xxxxx, where the low five bits are the length and the string bytes follow. A small map up to 15 pairs is fixmap, 0x80 to 0x8f, format 1000xxxx. A small array up to 15 elements is fixarray, 0x90 to 0x9f, format 1001xxxx.

*MessagePack's first-byte map. Because every value's type is in its prefix, a blob decodes to a typed tree with no schema, the opposite of protobuf's design.*

Above the fixed-prefix families sit the explicit-length ones. nil is 0xc0, false is 0xc2, true is 0xc3. Unsigned integers are 0xcc through 0xcf for 8, 16, 32, and 64 bits; signed are 0xd0 through 0xd3. Strings too long for fixstr use str 8 at 0xd9, str 16 at 0xda, str 32 at 0xdb. Binary blobs, which MessagePack distinguishes from strings (a distinction added in 2013 and worth knowing because older encoders conflated them), are bin 8/16/32 at 0xc4 through 0xc6. And the extension family, fixext at 0xd4 through 0xd8 and ext 8/16/32 at 0xc7 through 0xc9, lets an application stuff its own typed payloads in with a one-byte type code, which is where you find timestamps and custom objects.

The spec gives one rule that helps when you are reasoning about what an encoder produced: when a value fits several formats, a serializer SHOULD pick the one with the fewest bytes. So a small integer will be a fixint, not a uint 32, unless the encoder is lazy or pinning a width on purpose. That predictability makes MessagePack pleasant to read by hand. You walk the bytes, each prefix tells you the type and how many bytes to consume, and you reconstruct the tree. The only thing missing is the meaning of the map keys, and those are usually right there as strings.

MessagePack shows up in real traffic more than its low profile suggests. It is common in WebSocket payloads, in some game and chat protocols, in Redis tooling, and in places where someone wanted JSON’s data model without JSON’s size. When you intercept a WebSocket frame that starts with 0x80-ish or 0xa0-ish bytes and is not valid JSON, MessagePack is the first thing to test.

gRPC framing: protobuf wrapped for HTTP/2

Most protobuf you meet on a modern backend arrives inside gRPC, which means it is wrapped in two more layers before it reaches the wire: a gRPC length-prefix and an HTTP/2 transport. Understanding the wrapper matters because your interception tool shows you the wrapper first, and if you do not recognize it you will try to parse the five-byte gRPC header as protobuf and get nonsense.

The gRPC over HTTP/2 protocol spec defines the message framing as a Length-Prefixed-Message: a one-byte Compressed-Flag, a four-byte big-endian Message-Length, and then the message bytes. The flag is 0 or 1; when it is 1 the message body is compressed with whatever codec the grpc-encoding header names. The length is an unsigned 32-bit big-endian integer, which caps a single message at just under 4 GB. So the first five bytes of any gRPC message are pure framing. If you see 00 00 00 00 2a at the start of a DATA frame’s payload, that is “uncompressed, 42 bytes follow,” and the protobuf message starts at byte six.

*The five-byte gRPC prefix: a compression flag and a big-endian length. Strip those five bytes and the protobuf message begins. HTTP/2 frame boundaries are unrelated to message boundaries.*

The HTTP/2 layer adds the routing. A gRPC call is one HTTP/2 stream, and the spec uses HTTP/2 stream IDs as the call identifier. The request is a POST whose :path pseudo-header is /Service-Name/MethodName, so the method you are calling is right there in the path even before you decode the body. The content-type is application/grpc, optionally suffixed (application/grpc+proto, +json, and so on). The header te: trailers is required and is a reliable gRPC tell, because gRPC carries its status code in HTTP/2 trailers rather than in a normal response header. After the body, a trailing header block sends grpc-status (0 for OK) and optionally grpc-message. The spec is explicit that status goes in trailers even when the status is OK.

Two details bite people doing interception. First, the spec says DATA frame boundaries have no relation to message boundaries, so one logical message can be split across several HTTP/2 DATA frames and one frame can hold several messages; you must reassemble at the gRPC framing layer, not the HTTP/2 frame layer. Second, gRPC-Web exists as a separate variant for browsers, because browsers cannot manipulate HTTP/2 trailers or raw frames directly, so gRPC-Web moves the trailers into the message body and uses a slightly different content type. If you are reversing a web app rather than a native one, you are likely looking at gRPC-Web, not classic gRPC.

Getting to the point where you can read any of this requires defeating transport encryption and often certificate pinning first, which is its own subject. The companion post on reverse-engineering a mobile app’s API covers the TLS-interception and pinning side; this post assumes you have already gotten the plaintext bytes in front of you and starts from there.

The tooling that reads the skeleton for you

You can decode protobuf by hand, and doing it once is genuinely worth it because it demystifies the format. After that you want tools. There are three jobs the tools do: disassemble raw bytes into a readable structural dump, guess types and let you re-encode, and recover the actual schema from a compiled binary. Different tools own different jobs.

protoscope: a faithful, schema-free disassembler

Protoscope, maintained by the protocolbuffers organization itself, is a small language and tool for representing protobuf wire-format bytes as human-editable text and emitting them back. It was inspired by DER ASCII, a similar tool for ASN.1. The point of protoscope is fidelity, not interpretation. It knows the wire format and almost nothing about schemas, so it shows you field numbers, wire types, varints, and length prefixes as text, and you can edit that text and assemble it straight back to bytes.

You install it with go install github.com/protocolbuffers/protoscope/cmd/protoscope...@latest and run the disassembler over a captured message. The output is the structural skeleton: each field by number, each value rendered in a way that round-trips. The project’s strongest guarantee is exactly that round-trip. Its README states that piping bytes through disassembly and reassembly, protoscope | protoscope -s, is always equivalent to cat, so the tool never silently changes your data.

The honest caveat is built into the design. Because protoscope has no schema, its disassembler uses heuristics to decide whether a LEN field is a nested message or a string, and the project documentation calls these heuristics “necessarily imperfect.” It will sometimes render a string as a nested message or vice versa. That is not a bug, it is the LEN ambiguity from earlier showing through. Protoscope’s answer is to be transparent about the guess and to let you override it, rather than to pretend certainty it does not have.

blackboxprotobuf: guess, edit, re-encode

Blackbox Protobuf, from NCC Group, is a Python library and a Burp Suite extension for decoding and re-encoding protobuf without the .proto. Where protoscope aims for a faithful textual dump, blackboxprotobuf aims to give you a typed, editable structure and to let you change a value and put it back on the wire. It is built for the security workflow: intercept a request, decode it, modify a field, re-encode, replay.

Its type guessing follows the heuristics the format forces on it. For a varint field it defaults to a signed integer. For fixed-width fields it leans toward floating point. For LEN fields it tries to parse the payload as a nested message and falls back to string or bytes if that fails. It emits both a decoded dictionary and a generated typedef, and the typedef is the important artifact, because you can correct it (this field is really a string, that one is really a packed array, this nested message has these sub-fields) and give fields human names, then reuse it across many captures of the same endpoint. The library is candid that the guesses are imperfect and that packed repeated fields in particular need human intervention, since they carry no per-element tags to mark themselves.

The Burp extension is the part most people touch. It surfaces intercepted protobuf bodies as an editable tree inside the proxy, so you can fuzz an endpoint, flip a field, and watch the server’s reaction without ever writing the encode/decode code yourself. Two sibling projects extend it: ProtoDeep wraps the same library with a friendlier analysis UI, and there are maintained forks that track newer Python and Burp releases.

pbtk: recovering the actual schema from the binary

The tools above work on bytes. Pbtk, by marin-m, works on the application. Its premise is that the schema is usually not truly gone; it is compiled into the app, and you can extract it. Pbtk has two halves: extracting .proto structures out of compiled programs, and editing, replaying, and fuzzing data against protobuf endpoints.

The extraction is the interesting half because it recovers names, the one thing the wire format throws away. Pbtk handles several embeddings. It pulls definitions out of Java runtimes including the Lite, Nano, Micro, and J2ME variants, with support for ProGuard-obfuscated builds. It reads embedded reflection metadata, which many C++ and some Java binaries carry, and reconstructs .proto files from it. It handles certain web applications. The web extractor relies on an older JavaScript protobuf runtime and, by the project’s own note, wants updating for current builds, so your mileage on a modern web app varies. When it works, the payoff is large: you go from “field 3 is a varint that is sometimes 1 and sometimes 2” to “field 3 is the enum AccountState with named values,” which is the difference between guessing and knowing.

A note on the legal and ethical line, because pbtk crosses into pulling apart someone else’s binary. Extracting a schema from an app you do not own touches reverse-engineering law, terms of service, and sometimes anti-circumvention statutes, and the answer is jurisdiction-specific and fact-specific. Testing your own app, an app you are authorized to audit, or interoperating where the law permits it is one thing; pulling apart a third party’s binary to attack their service is another. This post is about how the formats work, not a license to take any particular action against any particular target.

mitmproxy’s interactive contentviews

The newest piece of this toolkit is built into the proxy itself. Mitmproxy 12, released April 29, 2025, added interactive contentviews, which let you edit a binary message through its prettified, human-readable representation and have the proxy re-encode it back to bytes automatically. For protobuf, you can point the protobuf_definitions option at a .proto file and edit by field name; without a .proto, you can still edit primitive values (strings, integers, nested messages) and mitmproxy re-encodes the change. The announcement names gRPC, protobuf, and MessagePack as the binary protocols this makes tractable. The mechanism is a paired prettify and reencode API, so the round-trip is a first-class operation rather than a hack.

This collapses a lot of the workflow into one window. You no longer pipe captured bytes through a separate disassembler, edit text, reassemble, and inject. You intercept, you see the tree, you change a value, you forward. For exploratory probing of an unknown endpoint, that loop is fast enough to change how the work feels.

From skeleton to meaning: the part the tools cannot do

Every tool above gets you the skeleton. Field 1 is a varint that holds 150. Field 2 is a string. Field 5 is a nested message with three sub-fields. What none of them gives you, because it is genuinely not in the bytes, is what those fields mean. That last mile is human work, and it is the same detective method whether the format is protobuf, MessagePack, or anything else with structure but no labels.

The method is correlation. You change one thing in the app and watch which field in the payload changes. You type a known value into a search box and find the field that now holds that string. You toggle a setting and watch a varint flip between 0 and 1, which tells you it is a bool. You log in as a different account and watch the long varint that must be a user ID change. You capture the same request twice a second apart and find the field whose I64 value increments like a millisecond timestamp. Field by field, you build a labeled .proto from observed behavior rather than from documentation, and each label you pin makes the next one easier because it constrains what the neighbors can be.

Certain field shapes telegraph their meaning. A 16-byte LEN field is probably a UUID or a binary token. An eight-byte I64 that grows monotonically across captures is almost certainly a timestamp. A varint that only ever takes a handful of small distinct values is an enum, and the distinct values are the enum members waiting for names. A LEN field of high-entropy bytes that changes every request and that the server validates is a signature or a nonce, and that is where the work connects to the harder defensive topics, because validated, opaque tokens are exactly what anti-bot and anti-tamper systems plant in payloads. If the surrounding app obfuscates the code that builds those fields, you are into the territory of control-flow flattening and string encryption and the broader anti-bot integrity-check arms race, where the protocol is the easy part and the token-generation logic is the wall.

There is a clean asymmetry worth ending on. Protobuf and MessagePack made opposite bets on where type information should live, and that bet decides how much a reverse engineer can recover from bytes alone. MessagePack put the types on the wire, so its blobs decode to fully typed trees with no schema and the only missing piece is the meaning of the keys. Protobuf put the types in the schema, so its blobs give you structure and field numbers but force you to guess types and reconstruct names, and the LEN ambiguity means even the structure is sometimes a guess. Neither choice was wrong; they optimized for different things, protobuf for wire size and schema evolution, MessagePack for self-description. But if you are the one staring at an undocumented payload, the format’s old design decision is sitting right there in front of you, deciding for each byte whether you get to read it or have to deduce it. The bytes were never opaque. They just told you different amounts about themselves depending on a choice someone made years before you ever pointed a proxy at them.

Sources & further reading

Protocol Buffers project (2024), Encoding — the authoritative wire-format spec: varints, the (field_number << 3) | wire_type tag, the six wire types, ZigZag, LEN records, packed fields.
protocolbuffers (2024), protoscope — Google’s schema-free wire-format disassembler/assembler, with the round-trip guarantee and the “necessarily imperfect” heuristic caveat.
NCC Group (2024), blackboxprotobuf — Python library and Burp extension for decoding, editing, and re-encoding protobuf without a .proto, including the type-guessing heuristics.
msgpack (2024), MessagePack specification — the format-byte map: fixint, fixmap, fixarray, fixstr, the explicit-length families, bin, and ext.
gRPC project (2024), gRPC over HTTP/2 — the Length-Prefixed-Message framing, stream mapping, required headers, and trailer-carried grpc-status.
marin-m (2024), pbtk — toolkit for extracting .proto schemas from compiled apps (Java/ProGuard, embedded reflection metadata, web) and fuzzing protobuf endpoints.
mitmproxy (2025), mitmproxy 12 — interactive contentviews that prettify and re-encode gRPC/protobuf/MessagePack, with or without a schema.
Kreya (2023), Demystifying the protobuf wire format — a clear walkthrough of tags, varints, and records that complements the official spec.
Synacktiv (2023), mitmproxy for fun and profit — a practitioner write-up on intercepting and analysing application traffic including binary protocols.
ydkhatri / swiftforensics (2020), Parsing unknown protobufs with python — a forensic perspective on decoding schema-less protobuf, the origin of one widely used blackboxprotobuf fork.