Skip to content
master
Go to file
Code

Latest commit

* add proto package

* remove proto.structField.gotype

* add proto.RawMessage

* internal/hack => internal/runtime_reflect

* add RawMessage APIs

* add fixture-based tests

* always regenerate fixtures

* WIP: rewrite

* add rewrite tests

* type safe methods for proto.FieldNumber

* add reflective APIs

* simplify type conversion rules

* add proto.Type.Name

* add rewrite templates

* remove go.sum

* add rewrite benchmark + fix size computation of inner messages

* minimize the serialized representation of pointers to protobuf messages

* update CI

* omit zero values in rewrites

* optimize message rewriter

* cleanup

* fix crash when constructing protobuf type of recursive go types

* fix crash when constructing protobuf type of recursive go types (2)

* add test for generating reflective view recursive type

* add support for custom messages

* Update proto/bool.go

Co-authored-by: Chris O'Hara <cohara87@gmail.com>

* add missing import

* inline decodeBool

* optimize isZeroBytes

* Update proto/size.go

Co-authored-by: Chris O'Hara <cohara87@gmail.com>

* Update proto/rewrite.go

Co-authored-by: Chris O'Hara <cohara87@gmail.com>

* test zigzag encoding with min/max int64

* differentiate 0.0 and -0.0

* add checks for integer overflows

* document internal/runtime_reflect

* document why we don't expose StartGroup and EndGroup

* switch proto.FieldNumber and proto.WireType to unsigned integers

* use smaller buffers to encode varints

* check for buffer overflows in proto.Parse to avoid panics

* proto.Unmarshal sets the zero-value when the input is empty

* use math.Copysign

* simplify encodeVarint

* handle edge case over int overflow when checking buffer space decoding varlen in proto.Parse

* unroll loop in encodeVarint

* add proto.RawMessage.Scan

* make Scan a package function

* allow proto.Scan callback to return an error

* change proto.Parse and proto.Scan to accept a []byte

* add proto.opaqueMessageType

* differentiate between custom types and generated protobuf types

* fix potential buffer overflows

* proto.TypeOf accepts a reflect.Type

* change the signature of proto.TypeOf to take a reflect.Type instead of interface{}

* add a benchmark for proto.Scan

* interpret custom types as opaque byte sequences instead of opaque struct types

* handle the conversion of fixed32 and fixed64 to uint32 and uint64

* add support for rewriting maps

* don't mark fields as repeated when they are maps

Co-authored-by: Chris O'Hara <cohara87@gmail.com>
fce2430

Git stats

Files

Permalink
Failed to load latest commit information.

README.md

encoding Circle CI Go Report Card GoDoc

Go package containing implementations of encoders and decoders for various data formats.

Motivation

At Segment, we do a lot of marshaling and unmarshaling of data when sending, queuing, or storing messages. The resources we need to provision on the infrastructure are directly related to the type and amount of data that we are processing. At the scale we operate at, the tools we choose to build programs can have a large impact on the efficiency of our systems. It is important to explore alternative approaches when we reach the limits of the code we use.

This repository includes experiments for Go packages for marshaling and unmarshaling data in various formats. While the focus is on providing a high performance library, we also aim for very low development and maintenance overhead by implementing APIs that can be used as drop-in replacements for the default solutions.

Requirements and Maintenance Schedule

This package has no dependencies outside of the core runtime of Go. It requires a recent version of Go.

This package follows the same maintenance schedule as the Go project, meaning that issues relating to versions of Go which aren't supported by the Go team, or versions of this package which are older than 1 year, are unlikely to be considered.

Additionally, we have fuzz tests which aren't a runtime required dependency but will be pulled in when running go mod tidy. Please don't include these go.mod updates in change requests.

encoding/json GoDoc

More details about the implementation of this package can be found here.

The json sub-package provides a re-implementation of the functionalities offered by the standard library's encoding/json package, with a focus on lowering the CPU and memory footprint of the code.

The exported API of this package mirrors the standard library's encoding/json package, the only change needed to take advantage of the performance improvements is the import path of the json package, from:

import (
    "encoding/json"
)

to

import (
    "github.com/segmentio/encoding/json"
)

The improvement can be significant for code that heavily relies on serializing and deserializing JSON payloads. The CI pipeline runs benchmarks to compare the performance of the package with the standard library and other popular alternatives; here's an overview of the results (using Go v1.13):

Comparing to encoding/json

goos: linux
goarch: amd64

name                           old time/op    new time/op     delta
Marshal/*json.codeResponse2      9.05ms ±12%     6.40ms ±23%   -29.34%  (p=0.000 n=8+8)
Unmarshal/*json.codeResponse2    35.3ms ± 7%      9.6ms ± 0%   -72.83%  (p=0.001 n=7+7)

name                           old speed      new speed       delta
Marshal/*json.codeResponse2     215MB/s ±13%    310MB/s ±20%   +43.80%  (p=0.000 n=8+8)
Unmarshal/*json.codeResponse2  55.1MB/s ± 7%  202.5MB/s ± 0%  +267.41%  (p=0.001 n=7+7)

name                           old alloc/op   new alloc/op    delta
Marshal/*json.codeResponse2       0.00B           0.00B           ~     (all equal)
Unmarshal/*json.codeResponse2    1.86MB ± 1%     0.01MB ± 1%   -99.52%  (p=0.000 n=8+8)

name                           old allocs/op  new allocs/op   delta
Marshal/*json.codeResponse2        0.00            0.00           ~     (all equal)
Unmarshal/*json.codeResponse2     76.4k ± 0%       0.0k ± 0%   -99.95%  (p=0.000 n=8+8)

Comparing to github.com/json-iterator/go

goos: linux
goarch: amd64

name                           old time/op    new time/op     delta
Marshal/*json.codeResponse2      29.9ms ± 4%      6.4ms ±23%   -78.61%  (p=0.000 n=7+8)
Unmarshal/*json.codeResponse2    12.6ms ± 6%      9.6ms ± 0%   -23.77%  (p=0.001 n=7+7)

name                           old speed      new speed       delta
Marshal/*json.codeResponse2    64.9MB/s ± 4%  309.8MB/s ±20%  +377.19%  (p=0.000 n=7+8)
Unmarshal/*json.codeResponse2   152MB/s ±10%    202MB/s ± 0%   +32.97%  (p=0.000 n=8+7)

name                           old alloc/op   new alloc/op    delta
Marshal/*json.codeResponse2      3.40MB ± 0%     0.00MB       -100.00%  (p=0.000 n=8+8)
Unmarshal/*json.codeResponse2    1.03MB ± 0%     0.01MB ± 1%   -99.14%  (p=0.001 n=6+8)

name                           old allocs/op  new allocs/op   delta
Marshal/*json.codeResponse2        102k ± 0%         0k       -100.00%  (p=0.000 n=8+8)
Unmarshal/*json.codeResponse2     37.1k ± 0%       0.0k ± 0%   -99.89%  (p=0.000 n=6+8)

Although this package aims to be a drop-in replacement of encoding/json, it does not guarantee the same error messages. It will error in the same cases as the standard library, but the exact error message may be different.

encoding/iso8601 GoDoc

The iso8601 sub-package exposes APIs to efficiently deal with with string representations of iso8601 dates.

Data formats like JSON have no syntaxes to represent dates, they are usually serialized and represented as a string value. In our experience, we often have to check whether a string value looks like a date, and either construct a time.Time by parsing it or simply treat it as a string. This check can be done by attempting to parse the value, and if it fails fallback to using the raw string. Unfortunately, while the happy path for time.Parse is fairly efficient, constructing errors is much slower and has a much bigger memory footprint.

We've developed fast iso8601 validation functions that cause no heap allocations to remediate this problem. We added a validation step to determine whether the value is a date representation or a simple string. This reduced CPU and memory usage by 5% in some programs that were doing time.Parse calls on very hot code paths.

You can’t perform that action at this time.