Add Memcached Reporter #206

Clivern · 2021-06-09T21:28:31Z

This PR adds memcached reporter. So basically it is similar to InMemory but it stores the value in a memcached server. It also should work as expected in case of concurrent reads and writes. it uses this feature from memcached https://www.php.net/manual/en/memcached.cas.php. This requires php-memcached extension, so it will be a breaking change.

Usage

In application, we use Memcached reporter not HTTP reporter.

<?php
// app.php
include_once dirname(__FILE__) . "/vendor/autoload.php";

use Zipkin\Annotation;
use Zipkin\Endpoint;
use Zipkin\Samplers\BinarySampler;
use Zipkin\TracingBuilder;
use Zipkin\Reporters\Aggregation\MemcachedClient;
use Zipkin\Reporters\Memcached;

$endpoint = Endpoint::create('clivern');

$reporter = new Memcached([], new MemcachedClient('127.0.0.1'));

$sampler = BinarySampler::createAsAlwaysSample();
$tracing = TracingBuilder::create()
    ->havingLocalEndpoint($endpoint)
    ->havingSampler($sampler)
    ->havingReporter($reporter)
    ->build();

$tracer = $tracing->getTracer();

for ($i=1; $i <= 10000; $i++) {
  $span = $tracer->newTrace();
  $span->setName(sprintf("span_%d", $i));
  $span->start();

  try {
    // something quite fast
    var_dump("test");
  } finally {
    $span->finish();
  }

  $tracer->flush();
}

Then we create a cron job or a command to read from Memcached reporter and send to Zipkin using HTTP reporter. We may even add this as application middleware with a time interval.

<?php
// daemon.php
include_once dirname(__FILE__) . "/vendor/autoload.php";

use Zipkin\Reporters\Aggregation\MemcachedClient;
use Zipkin\Reporters\Memcached;
use Zipkin\Reporters\Http;

$reporter = new Memcached([], new MemcachedClient('127.0.0.1'));
$httpReporter = new Http(['endpoint_url' => 'http://zipkin.local:9411/api/v2/spans']);

while (true) {
    $spans = $reporter->flush();

    if (!empty($spans)){
        $httpReporter->report($spans);
    }

    sleep(5);
}

Flushing can be based on a time interval or the stored value size (Memcached support value up to 1MB by default that's ~1000000 chars). i didn't add any implementation since it is totally up to the developer and the scale zipkin is used. reporting 2 spans/second will have a different implementation than 1000 spans/second and I think this shouldn't be part of the package. in case you will hit the 1MB size limit in a second, you can either increase memcached maximum object size or report with different memcached keys, so basically you split the collected spans across different memcached keys to get more space before flushing.

Please let me know if something need to be adjusted!

Fix #22

Update:

To run as single process

<?php

include_once dirname(__FILE__) . "/vendor/autoload.php";

use Zipkin\Annotation;
use Zipkin\Endpoint;
use Zipkin\Samplers\BinarySampler;
use Zipkin\TracingBuilder;
use Zipkin\Reporters\Aggregation\MemcachedClient;
use Zipkin\Reporters\Memcached;
use Zipkin\Reporters\Http;


$endpoint = Endpoint::create('orders_service');

$httpReporter = new Http([
  'endpoint_url' => 'http://zipkin.local:9411/api/v2/spans'
]);

$reporter = new Memcached(
    [],
    $httpReporter,
    new MemcachedClient('127.0.0.1')
);

$sampler = BinarySampler::createAsAlwaysSample();
$tracing = TracingBuilder::create()
    ->havingLocalEndpoint($endpoint)
    ->havingSampler($sampler)
    ->havingReporter($reporter)
    ->build();

$tracer = $tracing->getTracer();

for ($i=1; $i <= 400; $i++) {
  $span = $tracer->newTrace();
  $span->setName(sprintf("span_%d", $i));
  $span->start();

  try {
    var_dump("report");
  } finally {
    $span->finish();
  }

  $tracer->flush();
}

src/Zipkin/Reporters/Http.php

composer.json

src/Zipkin/Reporters/Aggregation/MemcachedClient.php

src/Zipkin/Reporters/Memcached.php

jcchavezs · 2021-06-10T10:19:33Z

@Clivern thanks for starting this

Then we create a cron job or a command to read from Memcached reporter and send to Zipkin using HTTP reporter. We may even add this as application middleware with a time interval.

I have a question regarding this, do we truly need such a daemon? I can think of: on reporting if the time has passed or if the max size has been reached we run the reporter. Does that sound reasonable?

Also we should consider the max size for reporting, even with the daemon we should know how big is the reporting payload because server might have a limit (e.g. in zipkin-go it is https://github.com/openzipkin/zipkin-go/blob/master/reporter/http/http.go#L37)

Clivern · 2021-06-10T10:23:28Z

Thanks @jcchavezs for the review, i will check them case by case.

Clivern · 2021-06-10T10:59:45Z

@Clivern thanks for starting this

Then we create a cron job or a command to read from Memcached reporter and send to Zipkin using HTTP reporter. We may even add this as application middleware with a time interval.

I have a question regarding this, do we truly need such a daemon? I can think of: on reporting if the time has passed or if the max size has been reached we run the reporter. Does that sound reasonable?

Also we should consider the max size for reporting, even with the daemon we should know how big is the reporting payload because server might have a limit (e.g. in zipkin-go it is https://github.com/openzipkin/zipkin-go/blob/master/reporter/http/http.go#L37)

It make sense, if you want to collect and send from one process. I tried to be flexible, didn't want to enforce a certain approach and also if you want to remove the daemon, you can simply run the above on each incoming request to the application, like middleware but still it is a lot to do indeed. So i can add that piece of logic too so memcached reporter will do the aggregation and reporting to zipkin if configurable size or time reached. only issue i see, when reporting time reached, reporting to zipkin will slow down things a bit but should be fine.

include_once dirname(__FILE__) . "/vendor/autoload.php";

use Zipkin\Reporters\Aggregation\MemcachedClient;
use Zipkin\Reporters\Memcached;
use Zipkin\Reporters\Http;

$reporter = new Memcached([], new MemcachedClient('127.0.0.1'));
$httpReporter = new Http(['endpoint_url' => 'http://zipkin.local:9411/api/v2/spans']);

$last_reporting_time = /*call memcached here*/

if(time() - $last_reporting_time > 86400) {
    $spans = $reporter->flush();

    if (!empty($spans)){
        $httpReporter->report($spans);
    }
}

jcchavezs · 2021-06-11T12:24:25Z

So I think:

We can store all traces reported in memcached serialized using serialize, join them on every report and flush them into HTTP after x seconds (batch_interval) or max number of traces accumulated (batch_size, we need to have a counter for the number of spans, if we decide to join manually the serialized data, it includes already a counter on third position a:3:{i:0;O:1:). When it is time to report, we unserialize all the chunks, join them together all the traces and send them over HTTP. The reason for using serialize/deserialize over json_encode/json_decode is because we can't turn a json_decoded array into ReadbackSpan whereas we can do it with serialize/unserialize without overhead, this is if we store it in json we need to create a method for turning the json_decoded into Readbackspan and just for the sake of this exporter which sounds not enough reason for introducing a breaking change in the interface.
We could provide both daemon and the reporting in single process, we just need to make sure in docs that batch_interval=-1 and batch_size=-1 will skip all reporting cc @basvanbeek. The daemon would do just the reporting and won't accept this options because the execution of a cron can't be controled by our PHP code.

Does that make sense?

Clivern · 2021-06-11T13:24:49Z

So I think:

We can store all traces reported in memcached serialized using serialize, join them on every report and flush them into HTTP after x seconds (batch_interval) or max number of traces accumulated (batch_size, we need to have a counter for the number of spans, if we decide to join manually the serialized data, it includes already a counter on third position a:3:{i:0;O:1:). When it is time to report, we unserialize all the chunks, join them together all the traces and send them over HTTP. The reason for using serialize/deserialize over json_encode/json_decode is because we can't turn a json_decoded array into ReadbackSpan whereas we can do it with serialize/unserialize without overhead, this is if we store it in json we need to create a method for turning the json_decoded into Readbackspan and just for the sake of this exporter which sounds not enough reason for introducing a breaking change in the interface.

We could provide both daemon and the reporting in single process, we just need to make sure in docs that batch_interval=-1 and batch_size=-1 will skip all reporting cc @basvanbeek. The daemon would do just the reporting and won't accept this options because the execution of a cron can't be controled by our PHP code.

Does that make sense?

Yes, will apply the first point first (aggregate & report in a single process) & from there, we can see how to make it easy for everyone to opt-in and opt-out by changing batch_interval and batch_size values

src/Zipkin/Reporters/Aggregation/MemcachedClient.php

src/Zipkin/Reporters/Memcached.php

jcchavezs · 2021-06-16T08:27:22Z

This is great work @Clivern I just added a couple of comments then I am happy to merge it.

Clivern · 2021-06-20T21:50:33Z

This is great work @Clivern I just added a couple of comments then I am happy to merge it.

@jcchavezs I added test cases, did a few changes & tested it manually so please review it again.

jcchavezs · 2021-06-30T08:20:48Z

Great work @Clivern! I will merge it as soon as it gets green.

src/Zipkin/Reporters/Memcached.php

src/Zipkin/Reporters/Aggregation/MemcachedClient.php

src/Zipkin/Reporters/Memcached.php

Clivern · 2021-06-30T08:45:45Z

Great work @Clivern! I will merge it as soon as it gets green.

it passes now locally, just trigger it one more time.

$ zipkin-php git:(master) composer static-check
> phpstan analyse src --level 8
Note: Using configuration file /Users/Clivern/oss/zipkin-php/phpstan.neon.
 72/72 [▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓▓] 100%


 [OK] No errors

$ composer test

> phpunit tests
PHPUnit 9.5.5 by Sebastian Bergmann and contributors.

...............................................................  63 / 210 ( 30%)
............................................................... 126 / 210 ( 60%)
............................................................... 189 / 210 ( 90%)
.....................                                           210 / 210 (100%)

Time: 00:01.358, Memory: 14.00 MB

OK (210 tests, 477 assertions)

jcchavezs · 2021-06-30T09:30:07Z

do you mind rebasing, I just added this #207 which I think will fix the problem. If this does not fix it I will merge this PR as it is.

Clivern · 2021-06-30T09:57:53Z

done

Clivern · 2021-06-30T10:17:38Z

it seems the github action shivammathur/setup-php doesn't install memcached extension for windows even it is there.

jcchavezs · 2021-06-30T11:12:33Z

Maybe we try specifing memcached explicitly? shivammathur/setup-php#423 (comment) José Carlos Chávez ons. 30. jun. 2021 kl. 12:17 skrev Ahmed ***@***.***>:

…

it seems the github action shivammathur/setup-php doesn't install memcached extension even it is there. — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#206 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAXOYAWJTIIAE5CAWW63PX3TVLVMZANCNFSM46M3HGXA> .

Clivern · 2021-06-30T13:18:35Z

nope! it doesn't work, i tested it. they don't mention windows at all, all about ubuntu. not surprised! should be hard to get things right on windows (download extension dll file and append to php.ini)

only workaround i can think of right now is to add the following to top of test cases in https://github.com/openzipkin/zipkin-php/blob/master/tests/Unit/Reporters/MemcachedTest.php. anyways sdk must work even if memcached not there and repository ubuntu workflow works fine with memcached extension.

if (!class_exists('\Memcached')) {
        $this->markTestSkipped('Memcached extension is missing.');
        return;
}

composer test
Do not run Composer as root/super user! See https://getcomposer.org/root for details
> phpunit tests
PHPUnit 9.5.6 by Sebastian Bergmann and contributors.

...............................................................  63 / 210 ( 30%)
............................................................... 126 / 210 ( 60%)
.....................SSSSSSSSSSSSSS............................ 189 / 210 ( 90%)
.....................                                           210 / 210 (100%)

Time: 00:00.172, Memory: 12.00 MB

OK, but incomplete, skipped, or risky tests!
Tests: 210, Assertions: 447, Skipped: 14.

in that case we will have to skip these two files from phpstan analysis by adding these to end of phpstan.neon. didn't find a better way so far to bypass "not found memcached class" error if class not exist.

    excludePaths:
        analyse:
            - src/Zipkin/Reporters/Aggregation/MemcachedClient.php
            - src/Zipkin/Reporters/Memcached.php

jcchavezs · 2021-06-30T14:18:12Z

Sounds like a good tradeoff for now. Do you ming opening a PR for that? José Carlos Chávez ons. 30. jun. 2021 kl. 15:18 skrev Ahmed ***@***.***>:

…

nope! it doesn't work, i tested it. they don't mention windows at all, all about ubuntu. not surprised! should be hard to get things right on windows only workaround i can think of right now is to add the following to top of test cases in https://github.com/openzipkin/zipkin-php/blob/master/tests/Unit/Reporters/MemcachedTest.php. anyways sdk must work even if memcached not there and repository ubuntu workflow works fine with memcached extension. if (!class_exists('\Memcached')) { $this->markTestSkipped('Memcached extension is missing.'); return; } composer test Do not run Composer as root/super user! See https://getcomposer.org/root for details> phpunit tests PHPUnit 9.5.6 by Sebastian Bergmann and contributors. ............................................................... 63 / 210 ( 30%) ............................................................... 126 / 210 ( 60%) .....................SSSSSSSSSSSSSS............................ 189 / 210 ( 90%) ..................... 210 / 210 (100%) Time: 00:00.172, Memory: 12.00 MB OK, but incomplete, skipped, or risky tests! Tests: 210, Assertions: 447, Skipped: 14. in that case we will have to skip these two files from phpstan analysis by adding these to end of phpstan.neon. didn't find a better way so far to bypass "not found memcached class" error if class not exist. excludePaths: analyse: - src/Zipkin/Reporters/Aggregation/MemcachedClient.php - src/Zipkin/Reporters/Memcached.php — You are receiving this because you modified the open/close state. Reply to this email directly, view it on GitHub <#206 (comment)>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/AAXOYATHSWQCMLXKJ4KSBZTTVMKTLANCNFSM46M3HGXA> .

jcchavezs · 2021-09-12T21:05:41Z

I moved this code temporary to a branch, I need to release 3.0 and I would like to do some tweaks to the implementation but I don't want to block 3.0 so this is most likely included in 3.1.

Clivern added 12 commits June 7, 2021 22:54

add aggregation class and interface

47bd6c2

update doc block

932d026

add cache functionality #22

22ef007

add Bulk class

c9d7b30

add memcached reporter

05928d3

finalize memcache reporter #22

55e2790

finalize memcache reporter #22

4e44db0

update

aa97815

add teset case

0ad3843

add test cases

de7a275

add test cases

7fa696f

add ext-memcached to requirements

4abf82b