Skip to content

Commit 561a2d6

Browse files
jmacdlquerel
andauthored
Document the examples, including Docker instructions; add BUILDING.md, update all READMEs (#92)
This PR (at last!) fixes a number of outstanding Golang module issues. We now use a checked-in go.work file. The important aspect of this change, for the go.work approach to work, involves using the OTC `builder` with its `--skip-get-modules` flag set. This allows the go.work file to be used instead, and we now call `go work sync` instead. There are no more `replace` statements in any of the go.mod files, so single-line commands will now work (after the next release). The new BUILDING.md instructions explains how to build either with a local Golang toolchain, or using a Docker Golang toolchain. Each example has been tested and updated for a number of minor changes, including: - batchprocessor -> concurrentbatchprocessor - loggingexporter -> debugexporter - eliminate arrow::disabled flags - consistent port numbering - working `printer` example (now with fewer dependencies) - one fewer examples, there was a little redundancy. Fixes #72. --------- Co-authored-by: Laurent Quérel <laurent.querel@gmail.com>
1 parent affade9 commit 561a2d6

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

42 files changed

+1966
-3578
lines changed

CONTRIBUTING.md

+24-22
Original file line numberDiff line numberDiff line change
@@ -50,34 +50,36 @@ export PATH="$PATH:$(go env GOPATH)/bin"
5050
Once the `*.pb.go` files are generated, you need to replace the content of the `api/collector/arrow/v1` directory by the
5151
generated files present in the `./proto/api/collector/arrow/v1` directory.
5252

53+
### Releasing this repository
54+
55+
See the instructions in [RELEASING.md][./RELEASING.md].
56+
5357
### Local development issues
5458

55-
The network of dependencies involved in building OpenTelemetry
56-
Collector images has at times pushed the `go mod` toolchain to its
57-
limits. While we would like to recommend the `go work` tool for local
58-
development, there are currently unresolvable dependency problems that
59-
happen as a result of this.
60-
61-
The traditional solution to multi-module repositories before `go
62-
work` was introduced is the Go module `replace` statement, which
63-
allows mapping inter-repository dependencies to local directory paths,
64-
allowing you to build and test an OpenTelemetry collector with
65-
locally-modified sources.
66-
67-
While the use of replace statements works to enable local development,
68-
it prevents running code directly from the repository, which raises a
69-
barrier to entry. To work around this problem, the checked-in
70-
contents of `./collector/cmd/otelarrowcol/go.mod` must not contain
71-
`replace` statements. To build an `otelarrowcol` from locally
72-
modified sources requires uncommenting the `replaces` directive in
73-
`./collector/cmd/otelarrowcol/build-config.yaml` and re-running `make
74-
genotelarrowcol otelarrowcol`.
59+
This repository contains a top-level `go.work` file. This enables the
60+
Go modules defined here to avoid relative replace statements, which
61+
interfere with the ability to run them via simple `go install` and `go
62+
run` commands. The `go.work` file names all the module definitions
63+
inside this repository and allows them all to be used at once during
64+
local development.
65+
66+
### Upgrading OpenTelemetry Collector dependencies
67+
68+
When a new version of the OpenTelemetry collector, is available,
69+
the easiest way to upgrade this repository is:
70+
71+
1. Update the `distribution::otelcol_version` field in `otelarrowcol-build.yaml`
72+
2. Modify any components from the core or contrib repositories to use
73+
the corresponding versions (e.g., pprofextension's module version
74+
should match the new collector release).
75+
3. Regenerate `otelarrowcol` via `make genotelarrowcol`
76+
4. Run `go work sync` to update the other modules with fresh dependencies.
7577

7678
[OTCDOCS]: https://opentelemetry.io/docs/collector/
7779
[OTCGH]: https://github.com/open-telemetry/opentelemetry-collector
7880
[OACGH]: https://github.com/open-telemetry/otel-arrow-collector
79-
[EXPORTER]: https://github.com/open-telemetry/otel-arrow/collector/exporter/otelarrowexporter
80-
[RECEIVER]: https://github.com/open-telemetry/otel-arrow/collector/receiver/otelarrowreceiver
81+
[EXPORTER]: ./collector/exporter/otelarrowexporter/README.md
82+
[RECEIVER]: ./collector/receiver/otelarrowreceiver/README.md
8183
[DONATION]: https://github.com/open-telemetry/community/issues/1332
8284
[DEVPROCESS]: https://github.com/open-telemetry/otel-arrow-collector/issues/48
8385
[OTLPRECEIVER]: https://github.com/open-telemetry/opentelemetry-collector/receiver/otlpreceiver

Dockerfile

+41
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,41 @@
1+
# Copyright The OpenTelemetry Authors
2+
# SPDX-License-Identifier: Apache-2.0
3+
4+
# This Dockerfile builds the OpenTelemetry Protocol with Apache Arrow
5+
# Collector from the code in this repository. It builds using the
6+
# "otelarrowcol" configuration. See collector/otelarrowcol-build.yaml
7+
# for the components that are included in the build, which are all of
8+
# those with sources in this repository plus a few commonly useful
9+
# accessories (e.g., the profiler extension).
10+
FROM golang:1.21 AS sandbox
11+
12+
WORKDIR /otel-arrow
13+
COPY . .
14+
ENV CGO_ENABLED=0
15+
16+
# Note we recommend using the latest released builder, which will
17+
# update the core OpenTelemetry collector libraries to the newest
18+
# possible versions. When the latest set of collector dependencies
19+
# leads to a broken build here, this `latest` can instead be set
20+
# to the last-successful version of the OpenTelemetry collector.
21+
RUN go install go.opentelemetry.io/collector/cmd/builder@latest
22+
23+
# This command generates main.go, go.mod but does not update deps.
24+
RUN builder --skip-compilation --skip-get-modules --config=collector/otelarrowcol-build.yaml
25+
26+
# This build will update the go.mod, using the checked-in go.work file
27+
# in the repository.
28+
RUN go build -o otelarrowcol ./collector/cmd/otelarrowcol
29+
30+
# This build uses an Alpine Linux container.
31+
FROM alpine AS release
32+
COPY --from=sandbox /otel-arrow/otelarrowcol /
33+
34+
# Network ports
35+
# 4317 - OpenTelemetry gRPC services:
36+
# - OpenTelemetry Protocol with Apache Arrow
37+
# - OpenTelemetry Protocol (OTLP)
38+
# 1777 - Profiling support
39+
EXPOSE 4317/tcp 1777/tcp
40+
41+
ENTRYPOINT ["/otelarrowcol"]

Makefile

+13-12
Original file line numberDiff line numberDiff line change
@@ -11,8 +11,6 @@ MODULES := $(shell find . -name go.mod)
1111

1212
GODIRS := $(foreach d,$(MODULES),$(shell dirname $d))
1313
GOCMD?= go
14-
GOOS := $(shell $(GOCMD) env GOOS)
15-
GOARCH := $(shell $(GOCMD) env GOARCH)
1614
BUILD_INFO=-ldflags "-X $(BUILD_INFO_IMPORT_PATH).Version=$(VERSION)"
1715
VERSION=$(shell git describe --always --match "v[0-9]*" HEAD)
1816
BUILD_INFO_IMPORT_PATH=go.opentelemetry.io/collector/internal/version
@@ -104,21 +102,24 @@ endif
104102
git add .
105103
git commit -m "add multimod changes $(RELEASE_CANDIDATE)" || (echo "no multimod changes to commit")
106104

107-
# OTC's builder can be installed using:
108-
#
109-
# $(GOCMD) install go.opentelemetry.io/collector/cmd/builder@latest
110-
#
111-
# TODO install this locally
112-
BUILDER := builder
105+
# Install OTC's builder at the latest version
106+
BUILDER = builder
113107
.PHONY: $(BUILDER)
108+
builder:
109+
$(GOCMD) install go.opentelemetry.io/collector/cmd/builder@latest
114110

115111
.PHONY: genotelarrowcol
116-
genotelarrowcol:
117-
$(GOCMD) install go.opentelemetry.io/collector/cmd/builder@latest
118-
$(BUILDER) --skip-compilation --config collector/cmd/otelarrowcol/build.yaml --output-path collector/cmd/otelarrowcol
112+
genotelarrowcol: builder
113+
rm -f collector/cmd/otelarrowcol/*
114+
$(BUILDER) --skip-compilation --skip-get-modules --config collector/otelarrowcol-build.yaml
115+
$(GOCMD) work sync
119116

120117
.PHONY: otelarrowcol
121118
otelarrowcol:
122119
(cd collector/cmd/otelarrowcol && \
123120
GO111MODULE=on CGO_ENABLED=0 \
124-
$(GOCMD) build -trimpath -o ../../../bin/otelarrowcol_$(GOOS)_$(GOARCH) $(BUILD_INFO) .)
121+
$(GOCMD) build -trimpath -o ../../../bin/otelarrowcol $(BUILD_INFO) .)
122+
123+
.PHONY: docker-otelarrowcol
124+
docker-otelarrowcol:
125+
docker build . -t otelarrowcol

README.md

+35-70
Original file line numberDiff line numberDiff line change
@@ -1,10 +1,21 @@
1-
# OTel Arrow
1+
# OpenTelemetry Protocol with Apache Arrow
22

3-
The [OTel Arrow](https://github.com/open-telemetry/otel-arrow) project
4-
is an effort within [OpenTelemetry](https://opentelemetry.io/) to use
3+
The [OpenTelemetry Protocol with Apache
4+
Arrow](https://github.com/open-telemetry/otel-arrow) project is an
5+
effort within [OpenTelemetry](https://opentelemetry.io/) to use
56
[Apache Arrow](https://arrow.apache.org/) libraries for bulk data
67
transport in OpenTelemetry collection pipelines. This repository is
7-
the home of the OTel Arrow protocol and reference implementation.
8+
the home of the OpenTelemetry Protocol with Apache Arrow protocol and
9+
reference implementation.
10+
11+
## Quick start
12+
13+
Instructions for building an OpenTelemetry Collector with the modules
14+
in this repository are provided in [`BUILDING.md`][./collector/BUILDING.md].
15+
16+
Examples for running the OpenTelemetry Collector with the modules in
17+
this repository are documented in
18+
[collector/examples](./collector/examples/README.md).
819

920
## Overview
1021

@@ -32,23 +43,23 @@ than can be achieved using a stateless protocol.
3243

3344
## Project goals
3445

35-
The OTel Arrow project is organized in phases. Our initial aim is to
46+
The OpenTelemetry Protocol with Apache Arrow project is organized in phases. Our initial aim is to
3647
facilitate traffic reduction between a pair of OpenTelemetry
37-
collectors, and ultimately, we believe that an end-to-end OTel Arrow
48+
collectors, and ultimately, we believe that an end-to-end OpenTelemetry Protocol with Apache Arrow
3849
pipeline will enable telemetry pipelines with substantially lower
3950
overhead to be built. These are our future milestones for
4051
OpenTelemetry and Apache Arrow integration:
4152

4253
1. Improve compression performance for OpenTelemetry data collection
43-
2. Extend OpenTelemetry client SDKs to natively support the OTel Arrow Protocol
44-
3. Extend the OpenTelemetry collector with direct support for OTel Arrow pipelines
54+
2. Extend OpenTelemetry client SDKs to natively support the OpenTelemetry Protocol with Apache Arrow Protocol
55+
3. Extend the OpenTelemetry collector with direct support for OpenTelemetry Protocol with Apache Arrow pipelines
4556
4. Extend OpenTelemetry data model with support for multi-variate metrics.
4657
5. Output OpenTelemetry data to the Parquet file format, part of the Apache Arrow ecosystem
4758

48-
### Improve network-level compression with OTel Arrow
59+
### Improve network-level compression with OpenTelemetry Protocol with Apache Arrow
4960

50-
The first general-purpose application for OTel Arrow is traffic
51-
reduction. At a high-level, OTel Arrow performs the following steps
61+
The first general-purpose application for the project is traffic
62+
reduction. At a high-level, this protocol performs the following steps
5263
to compactly encode and transmit telemetry using Apache Arrow.
5364

5465
1. Separate the OpenTelemetry Resource and Scope elements from the
@@ -61,10 +72,10 @@ to compactly encode and transmit telemetry using Apache Arrow.
6172
delta-dictionaries, and other low-level facilities to compactly
6273
encode the structure.
6374

64-
Here is a diagram showing how OTel Arrow transforms OTLP Log Records
75+
Here is a diagram showing how the protocol transforms OTLP Log Records
6576
into column-oriented data, which also makes the data more compressible.
6677

67-
![OTel Arrow Log Record](https://github.com/open-telemetry/oteps/blob/main/text/img/0156_logs_schema.png?raw=true)
78+
![OpenTelemetry Protocol with Apache Arrow](https://github.com/open-telemetry/oteps/blob/main/text/img/0156_logs_schema.png?raw=true)
6879

6980
## Project status
7081

@@ -82,39 +93,39 @@ workloads.
8293
We are pleased to release two new collector components, presently
8394
housed in this this repository.
8495

85-
- [OTel Arrow Receiver](./collector/receiver/otelarrowreceiver/README.md)
86-
- [OTel Arrow Exporter](./collector/exporter/otelarrowexporter/README.md)
96+
- [OpenTelemetry Protocol with Apache Arrow Receiver](./collector/receiver/otelarrowreceiver/README.md)
97+
- [OpenTelemetry Protocol with Apache Arrow Exporter](./collector/exporter/otelarrowexporter/README.md)
8798

8899
We are working with the maintainers of the [OpenTelemetry
89100
Collector-Contrib](https://github.com/open-telemetry/opentelemetry-collector-contrib)
90101
to merge these components into that repository. [See our tracking
91102
issue](https://github.com/open-telemetry/opentelemetry-collector-contrib/issues/26491).
92103

93-
The OTel Arrow exporter and receiver components are drop-in compatible
104+
The OpenTelemetry Protocol with Apache Arrow exporter and receiver components are drop-in compatible
94105
with the core collector's OTLP exporter and receiver components.
95106
Users with an established OTLP collection pipeline between two
96107
OpenTelemetry Collectors can re-build their collectors with
97108
`otelarrow` components, then simply replace the component name `otlp`
98109
with `otelarrow`. The exporter and receiver both support falling back
99110
to standard OTLP in case either side does not recognize the protocol,
100-
so the upgrade should be painless. The OTel Arrow receiver serves
101-
both OTel Arrow and OTLP on the standard port for OTLP gRPC (4317).
111+
so the upgrade should be painless. The OpenTelemetry Protocol with Apache Arrow receiver serves
112+
both OpenTelemetry Protocol with Apache Arrow and OTLP on the standard port for OTLP gRPC (4317).
102113

103114
See the [Exporter](collector/exporter/otelarrowexporter/README.md) and
104115
[Receiver](collector/receiver/otelarrowreceiver/README.md)
105116
documentation for details and sample configurations.
106117

107118
### Project documentation
108119

109-
This package is a reference implementation of the OTel Arrow protocol
120+
This package is a reference implementation of the OpenTelemetry Protocol with Apache Arrow protocol
110121
specified in this
111122
[OTEP](https://github.com/open-telemetry/oteps/blob/main/text/0156-columnar-encoding.md),
112-
which is currently the best source of information about OTel Arrow.
123+
which is currently the best source of information about OpenTelemetry Protocol with Apache Arrow.
113124
The [Donation
114125
request](https://github.com/open-telemetry/community/issues/1332)
115126
describes how the project began.
116127

117-
Here are several more resources that are available to learn more about OTel Arrow.
128+
Here are several more resources that are available to learn more about OpenTelemetry Protocol with Apache Arrow.
118129

119130
- [Arrow Data Model](docs/data_model.md) - Mapping OTLP entities to Arrow Schemas.
120131
- [Benchmark results](docs/benchmarks.md) - Based on synthetic and production data.
@@ -128,7 +139,7 @@ Here are several more resources that are available to learn more about OTel Arro
128139
The following chart shows the compressed message size (in bytes) as a function
129140
of the batch size for metrics (univariate), logs, and traces. The bottom of the
130141
chart shows the reduction factor for both the standard OTLP protocol (with ZSTD
131-
compression) and the OTel Arrow protocol (ZSTD) in comparison with an
142+
compression) and the OpenTelemetry Protocol with Apache Arrow protocol (ZSTD) in comparison with an
132143
uncompressed OTLP protocol.
133144

134145
![compression_ratio](./docs/img/compression_ratio_summary_std_metrics.png)
@@ -158,60 +169,14 @@ data has not been updated and this collector is still fundamentally row-oriented
158169

159170
![collector internal overview](docs/img/collector_internal_overview.png)
160171

161-
> Note 2: A future phase 2 of this project will focus on implementing end-to-end OTel Arrow to improve the overall
162-
> performance.
172+
> Note 2: A future phase 2 of this project will focus on implementing end-to-end OpenTelemetry Protocol with Apache Arrow to improve the overall performance.
163173
164174
### Developers
165175

166176
Pull requests are welcome. For major changes, please open an issue
167177
first to discuss what you would like to change. For more information, please
168178
read [CONTRIBUTING](CONTRIBUTING.md).
169179

170-
#### Version history shared with OpenTelemetry Collector
171-
172-
The OTel Arrow exporter and receiver components were derived from the
173-
core OpenTelemetry Collector's OTLP exporter and receiver components.
174-
This approach was taken to ensure that the OTel Arrow receiver and
175-
exporter components remain "drop-in" compatible with OTLP exporters
176-
and receivers.
177-
178-
The shared version history describing how the OTel Arrow components
179-
were derived from the core OTLP components is recorded in [this
180-
repository](https://github.com/open-telemetry/otel-arrow-collector),
181-
which is not for general use. Maintainers of OTel Arrow may use the
182-
repository to maintain the version history going forward, [as discussed
183-
here](https://github.com/open-telemetry/otel-arrow/issues/43).
184-
185-
#### Example collector configurations
186-
187-
Examples demonstrating how to configure and test an OpenTelemetry
188-
Collector with OTel-Arrow exporter and receiver components are located
189-
in `./collector/examples`, including:
190-
191-
- [`examples/bridge`](https://github.com/f5/otel-arrow-adapter/tree/main/collector/examples/bridge):
192-
A compression bridge between "edge" and "saas" collectors.
193-
- [`examples/metadata-bridge`](https://github.com/f5/otel-arrow-adapter/tree/main/collector/examples/metadata-bridge):
194-
A compression bridge between "edge" and "saas" collectors with metadata support, allowing request headers to transit via OTel-Arrow.
195-
- [`examples/loopback`](https://github.com/f5/otel-arrow-adapter/tree/main/collector/examples/loopback):
196-
A collector that writes Arrow to and from itself.
197-
- [`examples/recorder`](https://github.com/f5/otel-arrow-adapter/tree/main/collector/examples/recorder):
198-
A collector with support for recording data files for diagnostic and benchmark purposes.
199-
- [`examples/synthesize`](https://github.com/f5/otel-arrow-adapter/tree/main/collector/examples/synthesize):
200-
A collector with support for synthesizing telemetry data using a [telemetry-generator](https://github.com/lightstep/telemetry-generator) component.
201-
202-
#### Miscellaneous components
203-
204-
Several components were developed to facilitate testing and debugging
205-
the primary OTel Arrow components. Most importantly, these tools can
206-
be used to report problematic data to the OTel Arrow maintainers.
207-
These components are:
208-
209-
- `exporter/fileexporter`: Derived from the upstream [fileexporter](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/exporter/fileexporter), this component supports writing files that can be read by the corresponding `filereceiver` in this package (unlike the upstream).
210-
- `receiver/filereceiver`: Derived from the upstream [filereceiver](https://github.com/open-telemetry/opentelemetry-collector-contrib/tree/main/receiver/filereceiver), this component supports reading files written by the corresponding `fileexporter` in this package (unlike the upstream).
211-
- `processor/obfuscationprocessor`: Supports obfuscation of OpenTelemetry data using a [Feistel cipher](https://en.wikipedia.org/wiki/Feistel_cipher).
212-
- `processor/experimentprocessor`: A probabilistic routing component for conducting experiments between exporters.
213-
- `connector/validationconnector`: A component for on-the-fly validation of a local pipeline.
214-
215180
## License
216181

217-
OTel Arrow Protocol Adapter is licensed under Apache 2.0.
182+
OpenTelemetry Protocol with Apache Arrow Protocol Adapter is licensed under Apache 2.0.

0 commit comments

Comments
 (0)