All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, markdownlint, and this project adheres to Semantic Versioning.
- In
Dockerfile
, updated FROM instruction todebian:11.9-slim@sha256:acc5810124f0929ab44fc7913c0ad936b074cbd3eadf094ac120190862ba36c4
- In
requirements.txt
, updated:- s3fs==2024.6.0
- In
Dockerfile
, updated FROM instruction todebian:11.9-slim@sha256:0e75382930ceb533e2f438071307708e79dc86d9b8e433cc6dd1a96872f2651d
- In
requirements.txt
, updated:- azure-servicebus==7.12.2
- boto3
- confluent-kafka==2.4.0
- fastavro==1.9.4
- fastparquet==2024.2.0
- pandas==2.2.2
- pyarrow==16.1.0
- s3fs==2024.5.0
- In
Dockerfile
, updated FROM instruction todebian:11.8-slim@sha256:19664a5752dddba7f59bb460410a0e1887af346e21877fa7cec78bfd3cb77da5
- In
requirements.txt
, updated:- azure-servicebus==7.11.4
- boto3==1.28.85
- confluent-kafka==2.3.0
- fastavro==1.9.0
- fastparquet==2023.10.1
- pandas==2.1.3
- pyarrow==14.0.1
- websockets==12.0
- In
Dockerfile
, updated FROM instruction todebian:11.7-slim@sha256:c618be84fc82aa8ba203abbb07218410b0f5b3c7cb6b4e7248fda7785d4f9946
- In
requirements.txt
, updated:- azure-servicebus==7.11.2
- boto3==1.28.56
- confluent-kafka==2.2.0
- fastavro==1.8.3
- fastparquet==2023.8.0
- pandas==2.1.1
- pyarrow==13.0.0
- In
Dockerfile
, updated FROM instruction todebian:11.7-slim@sha256:924df86f8aad741a0134b2de7d8e70c5c6863f839caadef62609c1be1340daf5
- In
requirements.txt
, updated:- azure-servicebus==7.11.0
- boto3==1.26.153
- pandas==2.0.2
- pyarrow==12.0.1
- In
Dockerfile
, updated FROM instruction toBASE_IMAGE=debian:11.7-slim@sha256:f4da3f9b18fc242b739807a0fb3e77747f644f2fb3f67f4403fafce2286b431a
- In
requirements.txt
, updated:- azure-servicebus==7.10.0
- boto3==1.26.130
- confluent-kafka==2.1.1
- fastavro==1.7.4
- fastparquet==2023.4.0
- pandas==2.0.1
- pika==1.3.2
- pyarrow==12.0.0
- websockets==11.0.3
- In
Dockerfile
, updated FROM instruction toBASE_IMAGE=debian:11.6-slim@sha256:7acda01e55b086181a6fa596941503648e423091ca563258e2c1657d140355b1
- In
requirements.txt
, updated:- azure-servicebus==7.8.3
- boto3==1.26.104
- confluent-kafka==2.0.2
- fastavro==1.7.3
- fastparquet==2023.2.0
- pandas==2.0.0
- pyarrow==11.0.0
- websockets==11.0
- In
Dockerfile
, updated FROM instruction todebian:11.6-slim@sha256:98d3b4b0cee264301eb1354e0b549323af2d0633e1c43375d0b25c01826b6790
- In
requirements.txt
, updated:- boto3==1.26.48
- fastparquet==2022.12.0
- pandas==1.5.2
- pyarrow==10.0.1
- In
Dockerfile
, updated FROM instruction todebian:11.5-slim@sha256:e8ad0bc7d0ee6afd46e904780942033ab83b42b446b58efa88d31ecf3adf4678
- In
requirements.txt
, updated:- boto3==1.25.4
- fastavro==1.7.0
- pandas==1.5.1
- pika==1.3.1
- pyarrow==10.0.0
- websockets==10.4
- Removed support for
SENZING_DEFAULT_ENTITY_TYPE
- Single messages are sent as JSON Objects, not JSON lists
- In
Dockerfile
, updated FROM instruction todebian:11.5-slim@sha256:5cf1d98cd0805951484f33b34c1ab25aac7007bb41c8b9901d97e4be3cf3ab04
- In
requirements.txt
, updated:- boto3==1.24.81
- pandas==1.5.0
- Support for directory of
.json*
files
- Changed from
SENZING_AZURE_CONNECTION_STRING
toSENZING_AZURE_QUEUE_CONNECTION_STRING
for clarity
- Upgrade
Dockerfile
toFROM debian:11.3-slim@sha256:06a93cbdd49a265795ef7b24fe374fee670148a7973190fb798e43b3cf7c5d0f
- JSON default wasn't a string.
- Added support for Stream loader directives
SENZING_STREAM_LOADER_DIRECTIVE_NAME
SENZING_STREAM_LOADER_DIRECTIVE_ACTION
- Added support for Kafka configuration (
SENZING_KAFKA_CONFIGURATION
)
- Updated to Debian 11.2
- Fixed issue 95 which handles records that may have been previously dropped.
- Fixed issue 91 to properly log records that exceed the max size of a queue message.
- Updated Debian version 10.10
- Added subcommands for Azure Queue:
- Updated Makefile to use Debian 10.10 as the base image
- Added subcommands for Azure Queue:
avro-to-azure-queue
csv-to-azure-queue
gzipped-json-to-azure-queue
json-to-azure-queue
parquet-to-azure-queue
- Updated Debian version to 10.10
- Support
s3://
protocol - updated debian version to 10.9
- Added a max message size to batching for SQS, RabbitMQ, and Kafka.
- RabbitMQ virtual host is now a settable parameter.
- Removed suppor for adding records to a queue from a websocket. Loading records via websocket has been moved to the Senzing API server.
- Stream-producer no longer hangs if it cannot connect to the messaging server when first starting
- Support for
SENZING_DEFAULT_DATA_SOURCE
andSENZING_DEFAULT_ENTITY_TYPE
- Support
file://
protocol
- Added
endpoint_url
in AWS SQS configuration.
- Implemented reading csv files in chunks to reduce memory usage when loading large files. Use SENZING_CSV_ROWS_IN_CHUNK (default 10000) to set the number of rows per chunk.
- Programmable csv delimieter. Use SENZING_CSV_DELIMITER (default is ',')
- Fixed issue #49 to handle CSV input files with empty values.
- Added support for websocket:
websocket-to-kafka
,websocket-to-rabbitmq
,websocket-to-sqs
,websocket-to-sqs-batch
,websocket-to-stdout
- Microbatching for RabbitMQ, Kafka, and SQS. The batch of records is formatted as a json array
- SENZING_RECORDS_PER_MESSAGE is the number of records to include in a single message.
- Support for Governor
- Support for environment variables:
SENZING_RABBITMQ_ROUTING_KEY
SENZING_RABBITMQ_USE_EXISTING_ENTITIES
SENZING_RECORD_IDENTIFIER
SENZING_RECORD_SIZE_MAX
- Added support for gzip:
gzipped-json-to-kafka
,gzipped-json-to-rabbitmq
,gzipped-json-to-sqs
,gzipped-json-to-sqs-batch
,gzipped-json-to-stdout
- Monitoring metrics: input_counter_rate_interval, input_counter_rate_total, output_counter_rate_interval, output_counter_rate_total
- Exit metric: rate
- Subcommands: avro-to-sqs-batch, csv-to-sqs-batch, json-to-sqs-batch, and parquet-to-sqs-batch
- Bad variable
- Support for AWS SQS queue.
- Initial functionality
- File formats: JSON, CSV, Avro, Parquet
- Queues: RabbitMQ, Kafka, STDOUT