Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document the examples, including Docker instructions; add BUILDING.md, update all READMEs #92

Merged
merged 16 commits into from
Nov 10, 2023
Merged
46 changes: 24 additions & 22 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -50,34 +50,36 @@ export PATH="$PATH:$(go env GOPATH)/bin"
Once the `*.pb.go` files are generated, you need to replace the content of the `api/collector/arrow/v1` directory by the
generated files present in the `./proto/api/collector/arrow/v1` directory.

### Releasing this repository

See the instructions in [RELEASING.md][./RELEASING.md].

### Local development issues

The network of dependencies involved in building OpenTelemetry
Collector images has at times pushed the `go mod` toolchain to its
limits. While we would like to recommend the `go work` tool for local
development, there are currently unresolvable dependency problems that
happen as a result of this.

The traditional solution to multi-module repositories before `go
work` was introduced is the Go module `replace` statement, which
allows mapping inter-repository dependencies to local directory paths,
allowing you to build and test an OpenTelemetry collector with
locally-modified sources.

While the use of replace statements works to enable local development,
it prevents running code directly from the repository, which raises a
barrier to entry. To work around this problem, the checked-in
contents of `./collector/cmd/otelarrowcol/go.mod` must not contain
`replace` statements. To build an `otelarrowcol` from locally
modified sources requires uncommenting the `replaces` directive in
`./collector/cmd/otelarrowcol/build-config.yaml` and re-running `make
genotelarrowcol otelarrowcol`.
This repository contains a top-level `go.work` file. This enables the
Go modules defined here to avoid relative replace statements, which
interfere with the ability to run them via simple `go install` and `go
run` commands. The `go.work` file names all the module definitions
inside this repository and allows them all to be used at once during
local development.

### Upgrading OpenTelemetry Collector dependencies

When a new version of the OpenTelemetry collector, is available,
the easiest way to upgrade this repository is:

1. Update the `distribution::otelcol_version` field in `otelarrowcol-build.yaml`
2. Modify any components from the core or contrib repositories to use
the corresponding versions (e.g., pprofextension's module version
should match the new collector release).
3. Regenerate `otelarrowcol` via `make genotelarrowcol`
4. Run `go work sync` to update the other modules with fresh dependencies.

[OTCDOCS]: https://opentelemetry.io/docs/collector/
[OTCGH]: https://github.com/open-telemetry/opentelemetry-collector
[OACGH]: https://github.com/open-telemetry/otel-arrow-collector
[EXPORTER]: https://github.com/open-telemetry/otel-arrow/collector/exporter/otelarrowexporter
[RECEIVER]: https://github.com/open-telemetry/otel-arrow/collector/receiver/otelarrowreceiver
[EXPORTER]: ./collector/exporter/otelarrowexporter/README.md
[RECEIVER]: ./collector/receiver/otelarrowreceiver/README.md
[DONATION]: https://github.com/open-telemetry/community/issues/1332
[DEVPROCESS]: https://github.com/open-telemetry/otel-arrow-collector/issues/48
[OTLPRECEIVER]: https://github.com/open-telemetry/opentelemetry-collector/receiver/otlpreceiver
Expand Down
41 changes: 41 additions & 0 deletions Dockerfile
Original file line number Diff line number Diff line change
@@ -0,0 +1,41 @@
# Copyright The OpenTelemetry Authors
# SPDX-License-Identifier: Apache-2.0

# This Dockerfile builds the OpenTelemetry Protocol with Apache Arrow
# Collector from the code in this repository. It builds using the
# "otelarrowcol" configuration. See collector/otelarrowcol-build.yaml
# for the components that are included in the build, which are all of
# those with sources in this repository plus a few commonly useful
# accessories (e.g., the profiler extension).
FROM golang:1.21 AS sandbox

WORKDIR /otel-arrow
COPY . .
ENV CGO_ENABLED=0

# Note we recommend using the latest released builder, which will
# update the core OpenTelemetry collector libraries to the newest
# possible versions. When the latest set of collector dependencies
# leads to a broken build here, this `latest` can instead be set
# to the last-successful version of the OpenTelemetry collector.
RUN go install go.opentelemetry.io/collector/cmd/builder@latest

# This command generates main.go, go.mod but does not update deps.
RUN builder --skip-compilation --skip-get-modules --config=collector/otelarrowcol-build.yaml

# This build will update the go.mod, using the checked-in go.work file
# in the repository.
RUN go build -o otelarrowcol ./collector/cmd/otelarrowcol

# This build uses an Alpine Linux container.
FROM alpine AS release
COPY --from=sandbox /otel-arrow/otelarrowcol /

# Network ports
# 4317 - OpenTelemetry gRPC services:
# - OpenTelemetry Protocol with Apache Arrow
# - OpenTelemetry Protocol (OTLP)
# 1777 - Profiling support
EXPOSE 4317/tcp 1777/tcp

ENTRYPOINT ["/otelarrowcol"]
25 changes: 13 additions & 12 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -11,8 +11,6 @@ MODULES := $(shell find . -name go.mod)

GODIRS := $(foreach d,$(MODULES),$(shell dirname $d))
GOCMD?= go
GOOS := $(shell $(GOCMD) env GOOS)
GOARCH := $(shell $(GOCMD) env GOARCH)
BUILD_INFO=-ldflags "-X $(BUILD_INFO_IMPORT_PATH).Version=$(VERSION)"
VERSION=$(shell git describe --always --match "v[0-9]*" HEAD)
BUILD_INFO_IMPORT_PATH=go.opentelemetry.io/collector/internal/version
Expand Down Expand Up @@ -104,21 +102,24 @@ endif
git add .
git commit -m "add multimod changes $(RELEASE_CANDIDATE)" || (echo "no multimod changes to commit")

# OTC's builder can be installed using:
#
# $(GOCMD) install go.opentelemetry.io/collector/cmd/builder@latest
#
# TODO install this locally
BUILDER := builder
# Install OTC's builder at the latest version
BUILDER = builder
.PHONY: $(BUILDER)
builder:
$(GOCMD) install go.opentelemetry.io/collector/cmd/builder@latest

.PHONY: genotelarrowcol
genotelarrowcol:
$(GOCMD) install go.opentelemetry.io/collector/cmd/builder@latest
$(BUILDER) --skip-compilation --config collector/cmd/otelarrowcol/build.yaml --output-path collector/cmd/otelarrowcol
genotelarrowcol: builder
rm -f collector/cmd/otelarrowcol/*
$(BUILDER) --skip-compilation --skip-get-modules --config collector/otelarrowcol-build.yaml
$(GOCMD) work sync

.PHONY: otelarrowcol
otelarrowcol:
(cd collector/cmd/otelarrowcol && \
GO111MODULE=on CGO_ENABLED=0 \
$(GOCMD) build -trimpath -o ../../../bin/otelarrowcol_$(GOOS)_$(GOARCH) $(BUILD_INFO) .)
$(GOCMD) build -trimpath -o ../../../bin/otelarrowcol $(BUILD_INFO) .)

.PHONY: docker-otelarrowcol
docker-otelarrowcol:
docker build . -t otelarrowcol
105 changes: 35 additions & 70 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,21 @@
# OTel Arrow
# OpenTelemetry Protocol with Apache Arrow

The [OTel Arrow](https://github.com/open-telemetry/otel-arrow) project
is an effort within [OpenTelemetry](https://opentelemetry.io/) to use
The [OpenTelemetry Protocol with Apache
Arrow](https://github.com/open-telemetry/otel-arrow) project is an
effort within [OpenTelemetry](https://opentelemetry.io/) to use
[Apache Arrow](https://arrow.apache.org/) libraries for bulk data
transport in OpenTelemetry collection pipelines. This repository is
the home of the OTel Arrow protocol and reference implementation.
the home of the OpenTelemetry Protocol with Apache Arrow protocol and
reference implementation.

## Quick start

Instructions for building an OpenTelemetry Collector with the modules
in this repository are provided in [`BUILDING.md`][./collector/BUILDING.md].

Examples for running the OpenTelemetry Collector with the modules in
this repository are documented in
[collector/examples](./collector/examples/README.md).

## Overview

Expand Down Expand Up @@ -32,23 +43,23 @@ than can be achieved using a stateless protocol.

## Project goals

The OTel Arrow project is organized in phases. Our initial aim is to
The OpenTelemetry Protocol with Apache Arrow project is organized in phases. Our initial aim is to
facilitate traffic reduction between a pair of OpenTelemetry
collectors, and ultimately, we believe that an end-to-end OTel Arrow
collectors, and ultimately, we believe that an end-to-end OpenTelemetry Protocol with Apache Arrow
pipeline will enable telemetry pipelines with substantially lower
overhead to be built. These are our future milestones for
OpenTelemetry and Apache Arrow integration:

1. Improve compression performance for OpenTelemetry data collection
2. Extend OpenTelemetry client SDKs to natively support the OTel Arrow Protocol
3. Extend the OpenTelemetry collector with direct support for OTel Arrow pipelines
2. Extend OpenTelemetry client SDKs to natively support the OpenTelemetry Protocol with Apache Arrow Protocol
3. Extend the OpenTelemetry collector with direct support for OpenTelemetry Protocol with Apache Arrow pipelines
4. Extend OpenTelemetry data model with support for multi-variate metrics.
5. Output OpenTelemetry data to the Parquet file format, part of the Apache Arrow ecosystem

### Improve network-level compression with OTel Arrow
### Improve network-level compression with OpenTelemetry Protocol with Apache Arrow

The first general-purpose application for OTel Arrow is traffic
reduction. At a high-level, OTel Arrow performs the following steps
The first general-purpose application for the project is traffic
reduction. At a high-level, this protocol performs the following steps
to compactly encode and transmit telemetry using Apache Arrow.

1. Separate the OpenTelemetry Resource and Scope elements from the
Expand All @@ -61,10 +72,10 @@ to compactly encode and transmit telemetry using Apache Arrow.
delta-dictionaries, and other low-level facilities to compactly
encode the structure.

Here is a diagram showing how OTel Arrow transforms OTLP Log Records
Here is a diagram showing how the protocol transforms OTLP Log Records
into column-oriented data, which also makes the data more compressible.

![OTel Arrow Log Record](https://github.com/open-telemetry/oteps/blob/main/text/img/0156_logs_schema.png?raw=true)
![OpenTelemetry Protocol with Apache Arrow](https://github.com/open-telemetry/oteps/blob/main/text/img/0156_logs_schema.png?raw=true)

## Project status

Expand All @@ -82,39 +93,39 @@ workloads.
We are pleased to release two new collector components, presently
housed in this this repository.

- [OTel Arrow Receiver](./collector/receiver/otelarrowreceiver/README.md)
- [OTel Arrow Exporter](./collector/exporter/otelarrowexporter/README.md)
- [OpenTelemetry Protocol with Apache Arrow Receiver](./collector/receiver/otelarrowreceiver/README.md)
- [OpenTelemetry Protocol with Apache Arrow Exporter](./collector/exporter/otelarrowexporter/README.md)

We are working with the maintainers of the [OpenTelemetry
Collector-Contrib](https://github.com/open-telemetry/opentelemetry-collector-contrib)
to merge these components into that repository. [See our tracking
issue](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/26491).

The OTel Arrow exporter and receiver components are drop-in compatible
The OpenTelemetry Protocol with Apache Arrow exporter and receiver components are drop-in compatible
with the core collector's OTLP exporter and receiver components.
Users with an established OTLP collection pipeline between two
OpenTelemetry Collectors can re-build their collectors with
`otelarrow` components, then simply replace the component name `otlp`
with `otelarrow`. The exporter and receiver both support falling back
to standard OTLP in case either side does not recognize the protocol,
so the upgrade should be painless. The OTel Arrow receiver serves
both OTel Arrow and OTLP on the standard port for OTLP gRPC (4317).
so the upgrade should be painless. The OpenTelemetry Protocol with Apache Arrow receiver serves
both OpenTelemetry Protocol with Apache Arrow and OTLP on the standard port for OTLP gRPC (4317).

See the [Exporter](collector/exporter/otelarrowexporter/README.md) and
[Receiver](collector/receiver/otelarrowreceiver/README.md)
documentation for details and sample configurations.

### Project documentation

This package is a reference implementation of the OTel Arrow protocol
This package is a reference implementation of the OpenTelemetry Protocol with Apache Arrow protocol
specified in this
[OTEP](https://github.com/open-telemetry/oteps/blob/main/text/0156-columnar-encoding.md),
which is currently the best source of information about OTel Arrow.
which is currently the best source of information about OpenTelemetry Protocol with Apache Arrow.
The [Donation
request](https://github.com/open-telemetry/community/issues/1332)
describes how the project began.

Here are several more resources that are available to learn more about OTel Arrow.
Here are several more resources that are available to learn more about OpenTelemetry Protocol with Apache Arrow.

- [Arrow Data Model](docs/data_model.md) - Mapping OTLP entities to Arrow Schemas.
- [Benchmark results](docs/benchmarks.md) - Based on synthetic and production data.
Expand All @@ -128,7 +139,7 @@ Here are several more resources that are available to learn more about OTel Arro
The following chart shows the compressed message size (in bytes) as a function
of the batch size for metrics (univariate), logs, and traces. The bottom of the
chart shows the reduction factor for both the standard OTLP protocol (with ZSTD
compression) and the OTel Arrow protocol (ZSTD) in comparison with an
compression) and the OpenTelemetry Protocol with Apache Arrow protocol (ZSTD) in comparison with an
uncompressed OTLP protocol.

![compression_ratio](./docs/img/compression_ratio_summary_std_metrics.png)
Expand Down Expand Up @@ -158,60 +169,14 @@ data has not been updated and this collector is still fundamentally row-oriented

![collector internal overview](docs/img/collector_internal_overview.png)

> Note 2: A future phase 2 of this project will focus on implementing end-to-end OTel Arrow to improve the overall
> performance.
> Note 2: A future phase 2 of this project will focus on implementing end-to-end OpenTelemetry Protocol with Apache Arrow to improve the overall performance.

### Developers

Pull requests are welcome. For major changes, please open an issue
first to discuss what you would like to change. For more information, please
read [CONTRIBUTING](CONTRIBUTING.md).

#### Version history shared with OpenTelemetry Collector

The OTel Arrow exporter and receiver components were derived from the
core OpenTelemetry Collector's OTLP exporter and receiver components.
This approach was taken to ensure that the OTel Arrow receiver and
exporter components remain "drop-in" compatible with OTLP exporters
and receivers.

The shared version history describing how the OTel Arrow components
were derived from the core OTLP components is recorded in [this
repository](https://github.com/open-telemetry/otel-arrow-collector),
which is not for general use. Maintainers of OTel Arrow may use the
repository to maintain the version history going forward, [as discussed
here](https://github.com/open-telemetry/otel-arrow/issues/43).

#### Example collector configurations

Examples demonstrating how to configure and test an OpenTelemetry
Collector with OTel-Arrow exporter and receiver components are located
in `./collector/examples`, including:

- [`examples/bridge`](https://github.com/f5/otel-arrow-adapter/tree/main/collector/examples/bridge):
A compression bridge between "edge" and "saas" collectors.
- [`examples/metadata-bridge`](https://github.com/f5/otel-arrow-adapter/tree/main/collector/examples/metadata-bridge):
A compression bridge between "edge" and "saas" collectors with metadata support, allowing request headers to transit via OTel-Arrow.
- [`examples/loopback`](https://github.com/f5/otel-arrow-adapter/tree/main/collector/examples/loopback):
A collector that writes Arrow to and from itself.
- [`examples/recorder`](https://github.com/f5/otel-arrow-adapter/tree/main/collector/examples/recorder):
A collector with support for recording data files for diagnostic and benchmark purposes.
- [`examples/synthesize`](https://github.com/f5/otel-arrow-adapter/tree/main/collector/examples/synthesize):
A collector with support for synthesizing telemetry data using a [telemetry-generator](https://github.com/lightstep/telemetry-generator) component.

#### Miscellaneous components

Several components were developed to facilitate testing and debugging
the primary OTel Arrow components. Most importantly, these tools can
be used to report problematic data to the OTel Arrow maintainers.
These components are:

- `exporter/fileexporter`: Derived from the upstream [fileexporter](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/fileexporter), this component supports writing files that can be read by the corresponding `filereceiver` in this package (unlike the upstream).
- `receiver/filereceiver`: Derived from the upstream [filereceiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/filereceiver), this component supports reading files written by the corresponding `fileexporter` in this package (unlike the upstream).
- `processor/obfuscationprocessor`: Supports obfuscation of OpenTelemetry data using a [Feistel cipher](https://en.wikipedia.org/wiki/Feistel_cipher).
- `processor/experimentprocessor`: A probabilistic routing component for conducting experiments between exporters.
- `connector/validationconnector`: A component for on-the-fly validation of a local pipeline.

## License

OTel Arrow Protocol Adapter is licensed under Apache 2.0.
OpenTelemetry Protocol with Apache Arrow Protocol Adapter is licensed under Apache 2.0.
Loading