Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Memcached Reporter #206

Merged
merged 24 commits into from
Jun 30, 2021
Merged

Add Memcached Reporter #206

merged 24 commits into from
Jun 30, 2021

Conversation

Clivern
Copy link
Contributor

@Clivern Clivern commented Jun 9, 2021

This PR adds memcached reporter. So basically it is similar to InMemory but it stores the value in a memcached server. It also should work as expected in case of concurrent reads and writes. it uses this feature from memcached https://www.php.net/manual/en/memcached.cas.php. This requires php-memcached extension, so it will be a breaking change.

Usage

In application, we use Memcached reporter not HTTP reporter.

<?php
// app.php
include_once dirname(__FILE__) . "/vendor/autoload.php";

use Zipkin\Annotation;
use Zipkin\Endpoint;
use Zipkin\Samplers\BinarySampler;
use Zipkin\TracingBuilder;
use Zipkin\Reporters\Aggregation\MemcachedClient;
use Zipkin\Reporters\Memcached;

$endpoint = Endpoint::create('clivern');

$reporter = new Memcached([], new MemcachedClient('127.0.0.1'));

$sampler = BinarySampler::createAsAlwaysSample();
$tracing = TracingBuilder::create()
    ->havingLocalEndpoint($endpoint)
    ->havingSampler($sampler)
    ->havingReporter($reporter)
    ->build();

$tracer = $tracing->getTracer();

for ($i=1; $i <= 10000; $i++) {
  $span = $tracer->newTrace();
  $span->setName(sprintf("span_%d", $i));
  $span->start();

  try {
    // something quite fast
    var_dump("test");
  } finally {
    $span->finish();
  }

  $tracer->flush();
}

Then we create a cron job or a command to read from Memcached reporter and send to Zipkin using HTTP reporter. We may even add this as application middleware with a time interval.

<?php
// daemon.php
include_once dirname(__FILE__) . "/vendor/autoload.php";

use Zipkin\Reporters\Aggregation\MemcachedClient;
use Zipkin\Reporters\Memcached;
use Zipkin\Reporters\Http;

$reporter = new Memcached([], new MemcachedClient('127.0.0.1'));
$httpReporter = new Http(['endpoint_url' => 'http://zipkin.local:9411/api/v2/spans']);

while (true) {
    $spans = $reporter->flush();

    if (!empty($spans)){
        $httpReporter->report($spans);
    }

    sleep(5);
}

Flushing can be based on a time interval or the stored value size (Memcached support value up to 1MB by default that's ~1000000 chars). i didn't add any implementation since it is totally up to the developer and the scale zipkin is used. reporting 2 spans/second will have a different implementation than 1000 spans/second and I think this shouldn't be part of the package. in case you will hit the 1MB size limit in a second, you can either increase memcached maximum object size or report with different memcached keys, so basically you split the collected spans across different memcached keys to get more space before flushing.

Please let me know if something need to be adjusted!

Fix #22

Update:

To run as single process

<?php

include_once dirname(__FILE__) . "/vendor/autoload.php";

use Zipkin\Annotation;
use Zipkin\Endpoint;
use Zipkin\Samplers\BinarySampler;
use Zipkin\TracingBuilder;
use Zipkin\Reporters\Aggregation\MemcachedClient;
use Zipkin\Reporters\Memcached;
use Zipkin\Reporters\Http;


$endpoint = Endpoint::create('orders_service');

$httpReporter = new Http([
  'endpoint_url' => 'http://zipkin.local:9411/api/v2/spans'
]);

$reporter = new Memcached(
    [],
    $httpReporter,
    new MemcachedClient('127.0.0.1')
);

$sampler = BinarySampler::createAsAlwaysSample();
$tracing = TracingBuilder::create()
    ->havingLocalEndpoint($endpoint)
    ->havingSampler($sampler)
    ->havingReporter($reporter)
    ->build();

$tracer = $tracing->getTracer();

for ($i=1; $i <= 400; $i++) {
  $span = $tracer->newTrace();
  $span->setName(sprintf("span_%d", $i));
  $span->start();

  try {
    var_dump("report");
  } finally {
    $span->finish();
  }

  $tracer->flush();
}

composer.json Outdated Show resolved Hide resolved
@jcchavezs
Copy link
Contributor

@Clivern thanks for starting this

Then we create a cron job or a command to read from Memcached reporter and send to Zipkin using HTTP reporter. We may even add this as application middleware with a time interval.

I have a question regarding this, do we truly need such a daemon? I can think of: on reporting if the time has passed or if the max size has been reached we run the reporter. Does that sound reasonable?

Also we should consider the max size for reporting, even with the daemon we should know how big is the reporting payload because server might have a limit (e.g. in zipkin-go it is https://github.com/openzipkin/zipkin-go/blob/master/reporter/http/http.go#L37)

@Clivern
Copy link
Contributor Author

Clivern commented Jun 10, 2021

Thanks @jcchavezs for the review, i will check them case by case.

@Clivern
Copy link
Contributor Author

Clivern commented Jun 10, 2021

@Clivern thanks for starting this

Then we create a cron job or a command to read from Memcached reporter and send to Zipkin using HTTP reporter. We may even add this as application middleware with a time interval.

I have a question regarding this, do we truly need such a daemon? I can think of: on reporting if the time has passed or if the max size has been reached we run the reporter. Does that sound reasonable?

Also we should consider the max size for reporting, even with the daemon we should know how big is the reporting payload because server might have a limit (e.g. in zipkin-go it is https://github.com/openzipkin/zipkin-go/blob/master/reporter/http/http.go#L37)

It make sense, if you want to collect and send from one process. I tried to be flexible, didn't want to enforce a certain approach and also if you want to remove the daemon, you can simply run the above on each incoming request to the application, like middleware but still it is a lot to do indeed. So i can add that piece of logic too so memcached reporter will do the aggregation and reporting to zipkin if configurable size or time reached. only issue i see, when reporting time reached, reporting to zipkin will slow down things a bit but should be fine.

include_once dirname(__FILE__) . "/vendor/autoload.php";

use Zipkin\Reporters\Aggregation\MemcachedClient;
use Zipkin\Reporters\Memcached;
use Zipkin\Reporters\Http;

$reporter = new Memcached([], new MemcachedClient('127.0.0.1'));
$httpReporter = new Http(['endpoint_url' => 'http://zipkin.local:9411/api/v2/spans']);

$last_reporting_time = /*call memcached here*/

if(time() - $last_reporting_time > 86400) {
    $spans = $reporter->flush();

    if (!empty($spans)){
        $httpReporter->report($spans);
    }
}

@jcchavezs
Copy link
Contributor

So I think:

  1. We can store all traces reported in memcached serialized using serialize, join them on every report and flush them into HTTP after x seconds (batch_interval) or max number of traces accumulated (batch_size, we need to have a counter for the number of spans, if we decide to join manually the serialized data, it includes already a counter on third position a:3:{i:0;O:1:). When it is time to report, we unserialize all the chunks, join them together all the traces and send them over HTTP. The reason for using serialize/deserialize over json_encode/json_decode is because we can't turn a json_decoded array into ReadbackSpan whereas we can do it with serialize/unserialize without overhead, this is if we store it in json we need to create a method for turning the json_decoded into Readbackspan and just for the sake of this exporter which sounds not enough reason for introducing a breaking change in the interface.
  2. We could provide both daemon and the reporting in single process, we just need to make sure in docs that batch_interval=-1 and batch_size=-1 will skip all reporting cc @basvanbeek. The daemon would do just the reporting and won't accept this options because the execution of a cron can't be controled by our PHP code.

Does that make sense?

@Clivern
Copy link
Contributor Author

Clivern commented Jun 11, 2021

So I think:

  1. We can store all traces reported in memcached serialized using serialize, join them on every report and flush them into HTTP after x seconds (batch_interval) or max number of traces accumulated (batch_size, we need to have a counter for the number of spans, if we decide to join manually the serialized data, it includes already a counter on third position a:3:{i:0;O:1:). When it is time to report, we unserialize all the chunks, join them together all the traces and send them over HTTP. The reason for using serialize/deserialize over json_encode/json_decode is because we can't turn a json_decoded array into ReadbackSpan whereas we can do it with serialize/unserialize without overhead, this is if we store it in json we need to create a method for turning the json_decoded into Readbackspan and just for the sake of this exporter which sounds not enough reason for introducing a breaking change in the interface.
  2. We could provide both daemon and the reporting in single process, we just need to make sure in docs that batch_interval=-1 and batch_size=-1 will skip all reporting cc @basvanbeek. The daemon would do just the reporting and won't accept this options because the execution of a cron can't be controled by our PHP code.

Does that make sense?

Yes, will apply the first point first (aggregate & report in a single process) & from there, we can see how to make it easy for everyone to opt-in and opt-out by changing batch_interval and batch_size values

@jcchavezs
Copy link
Contributor

This is great work @Clivern I just added a couple of comments then I am happy to merge it.

@Clivern
Copy link
Contributor Author

Clivern commented Jun 20, 2021

This is great work @Clivern I just added a couple of comments then I am happy to merge it.

@jcchavezs I added test cases, did a few changes & tested it manually so please review it again.

@jcchavezs
Copy link
Contributor

Great work @Clivern! I will merge it as soon as it gets green.

@Clivern
Copy link
Contributor Author

Clivern commented Jun 30, 2021

Great work @Clivern! I will merge it as soon as it gets green.

it passes now locally, just trigger it one more time.

$ zipkin-php git:(master) composer static-check
> phpstan analyse src --level 8
Note: Using configuration file /Users/Clivern/oss/zipkin-php/phpstan.neon.
 72/72 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100%


 [OK] No errors

$ composer test

> phpunit tests
PHPUnit 9.5.5 by Sebastian Bergmann and contributors.

...............................................................  63 / 210 ( 30%)
............................................................... 126 / 210 ( 60%)
............................................................... 189 / 210 ( 90%)
.....................                                           210 / 210 (100%)

Time: 00:01.358, Memory: 14.00 MB

OK (210 tests, 477 assertions)

@jcchavezs
Copy link
Contributor

do you mind rebasing, I just added this #207 which I think will fix the problem. If this does not fix it I will merge this PR as it is.

@Clivern
Copy link
Contributor Author

Clivern commented Jun 30, 2021

done

@jcchavezs jcchavezs merged commit 0e9230e into openzipkin:master Jun 30, 2021
@Clivern
Copy link
Contributor Author

Clivern commented Jun 30, 2021

it seems the github action shivammathur/setup-php doesn't install memcached extension for windows even it is there.

@jcchavezs
Copy link
Contributor

jcchavezs commented Jun 30, 2021 via email

@Clivern
Copy link
Contributor Author

Clivern commented Jun 30, 2021

nope! it doesn't work, i tested it. they don't mention windows at all, all about ubuntu. not surprised! should be hard to get things right on windows (download extension dll file and append to php.ini)

only workaround i can think of right now is to add the following to top of test cases in https://github.com/openzipkin/zipkin-php/blob/master/tests/Unit/Reporters/MemcachedTest.php. anyways sdk must work even if memcached not there and repository ubuntu workflow works fine with memcached extension.

if (!class_exists('\Memcached')) {
        $this->markTestSkipped('Memcached extension is missing.');
        return;
}
composer test
Do not run Composer as root/super user! See https://getcomposer.org/root for details
> phpunit tests
PHPUnit 9.5.6 by Sebastian Bergmann and contributors.

...............................................................  63 / 210 ( 30%)
............................................................... 126 / 210 ( 60%)
.....................SSSSSSSSSSSSSS............................ 189 / 210 ( 90%)
.....................                                           210 / 210 (100%)

Time: 00:00.172, Memory: 12.00 MB

OK, but incomplete, skipped, or risky tests!
Tests: 210, Assertions: 447, Skipped: 14.

in that case we will have to skip these two files from phpstan analysis by adding these to end of phpstan.neon. didn't find a better way so far to bypass "not found memcached class" error if class not exist.

    excludePaths:
        analyse:
            - src/Zipkin/Reporters/Aggregation/MemcachedClient.php
            - src/Zipkin/Reporters/Memcached.php

@jcchavezs
Copy link
Contributor

jcchavezs commented Jun 30, 2021 via email

@jcchavezs
Copy link
Contributor

I moved this code temporary to a branch, I need to release 3.0 and I would like to do some tweaks to the implementation but I don't want to block 3.0 so this is most likely included in 3.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Explore possibilities for bulk reporting
2 participants