Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable propagation of 128-bit TraceID #651

Closed
wants to merge 19 commits into from

Conversation

jimschubert
Copy link
Contributor

@jimschubert jimschubert commented Sep 21, 2017

Problem

See #564

Finagle currently only "tolerates" 128-bit TraceID by dropping the high 64-bits.

My target tracing system uses all 128-bit ids (trace, parent, and span) generated from UUIDs, so support for propagating 128-bit TraceID is preferable.

Solution

Allows 128-bit TraceID to be consumed and propagated through the stack while defaulting to generating 64-bit TraceIDs.

Result

Finagle tracing continues to also support 64-bit trace identifiers. The upper 64-bit of a 128-bit TraceID is included on the wire format as below:

Big Endian spanId:8 parentId:8 traceId:8 flags:8 (traceIdHigh:8)

Here if the upper 8 bytes are set, they're treated as the upper 64-bit part of a TraceID.

Breaking: TraceId serialization becomes 40 bytes on the wire.

Question

I started down the path of generating 128-bit spanId and parentId, but I'm not sure of a good way to support this in Trace#tryUnmarshal so I decided to open the PR for feedback. Is this supported elsewhere? I've read some discussion that says TraceID makes sense in high traffic environments, while spans are generally fewer and unlikely to cause collisions. I can remove the flag, but wanted to gather thoughts first.

Allows 128-bit TraceID to be consumed while defaulting to generating
64-bit Spans.

Adds GlobalFlag c.t.f.tracing.generate128BitSpanIds to enable the
generation of 128-bit SpanIDs.
@codefromthecrypt
Copy link
Contributor

The trace ID has cause to consider 128bit, but parent and span ID not. It is not likely at all to get a clash within a trace. Since neither B3 nor its successor TraceContext are on a track to have 128 bit span IDs, I would pull it from this change. For example, if you make span ID 128 bit, it will break and restart a trace on most things I am aware of (like istio, grpc/census and everything in zipkin)

Copy link
Contributor

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for starting this!


val flags = Flags(flags64)
val sampled = if (flags.isFlagSet(Flags.SamplingKnown)) {
if (flags.isFlagSet(Flags.Sampled)) someTrue else someFalse
} else None

val traceId = TraceId(
if (trace64 == parent64) None else Some(SpanId(trace64)),
if (trace64 == parent64) None else Some(SpanId(traceIdHigh, trace64)),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For compat maybe better to add another spanid for high bits vs expand spanid to 128

This adds a new span, traceIdHigh, to TraceId to support the high bits
of a 128bit TraceID rather than truncating to 64bits.

This does not include generation of 128bit TraceID.
@jimschubert
Copy link
Contributor Author

@adriancole I updated this PR to include an additional span to account for the high bits, rather than extending SpanId support to 128bits. This is definitely a bit cleaner.

I had some concerns and TODO items.

  1. I overloaded TraceId#apply to reduce impact on existing codebases. However, adding the _traceIdHigh parameter to the case class requires an additional parameter for extractors. I had considered explicitly creating a TraceId#unapply to match the old signature, but I thought this may lead to unwanted results. I tend to have only a single extractor per object, and I didn't see any guidelines or clear examples in the code to suggest otherwise.

  2. I wasn't sure how to handle the conditional aspects of the additional 8 bytes in different representations of TraceId. For example, in TraceId#toString I only prepend traceIdHigh if it is non-zero. Yet in TraceId#serialize it will always serialize to a byte array of size 40.

  3. I'm not familiar with finagle-mux. I was wondering if someone else would pick up adding support for 128bit TraceID propagation there.

  4. This one isn't really a big deal. finagle-http previously relied on SpanId#fromString to parse the TraceID value from the header. To convert a string representing that 128bit TraceID into the low span and high span, I created TraceId#mk128BitTraceId. I didn't really like the name, or that it's only used by finagle-http and living in finagle-core but it makes more sense to me that it lives on the TraceId object.

TODO items:

  • tracing.thrift doesn't seem to generate compilable code using thrift 0.9 or 0.10 with the commands at at the top of the file. I added RequestHeader#trace_id_high at position 11. Is there a custom build of thrift, or instructions somewhere for generating tracing.thrift with the correct output?
  • Test/update finagle-thrift after tracing.thrift is regenerated and RichRequestHeader is updated.

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Sep 26, 2017 via email

Copy link
Contributor

@codefromthecrypt codefromthecrypt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

nice work

@jimschubert jimschubert changed the title Enable 128-bit TraceID, and optionally SpanIDs Enable propagation of 128-bit TraceID Sep 27, 2017
ByteArrays.put64be(bytes, 8, traceId.parentId.toLong)
ByteArrays.put64be(bytes, 16, traceId.traceId.toLong)
val bytes = new Array[Byte](40)
ByteArrays.put64be(bytes, 0, traceId.spanId.self)
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I switched over to my open IntelliJ to start looking at a different area of Finagle, unrelated to this PR, and this file was open. I remembered that these were previously toLong, and here I have changed them to self. I wasn't even thinking about the change when I did this, I was just directly referencing the property rather than the helper method . Should I go ahead and change these (in Id.scala, IdTest.scala) back to toLong?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest leaving it as toLong for clarity

Copy link
Contributor

@bryce-anderson bryce-anderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good, thanks for the PR! Most of my suggestions are asides as I've not really dug into this code before.

ByteArrays.put64be(bytes, 0, traceId.spanId.toLong)
ByteArrays.put64be(bytes, 8, traceId.parentId.toLong)
ByteArrays.put64be(bytes, 16, traceId.traceId.toLong)
val bytes = new Array[Byte](40)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we gate the size of the array to 32 if !traceId.traceIdHigh.isDefined. Otherwise, even if services are using the 64-bit trace-ids, they will need to have received this patch in order to support tracing.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had thought about doing this as well, but I wasn't sure if which would be the correct approach. I'll make the change.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In rest of zipkin we treat lack of traceIDHigh as unset. There's no binary format at the moment, and at some point there will be. However, here I think it is important to remain in "compat mode" with 32byte context when there's no traceIdHigh (or it is zero). This allows folks to upgrade more slowly.

*
* @return An <code>(Option[SpanId], Option[SpanId])</code> of 64bit or 128bit TraceID. Structure is (high, low).
*/
def mk128BitTraceId(spanId: String): (Option[SpanId], Option[SpanId]) = {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we make a named case class to return these SpanId's instead of using a Tuple2? I would have expected the low bytes to come first, but it's just a state of mind thing so giving it a name would make it unambiguous.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks. I meant to mark that as another question (conventions for creating one-off case classes like this) but forgot to mention it in the PR.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Was just getting around to this, and remembered a major part of the question. I wanted to call that case class just "TraceId", but this is already taken from the case class encapsulating the traceId, parentId, spanId, etc. In other implementations, I think I've seen the current TraceId named things like TraceContext or SpanContext. I had considered a case class namedFullTraceId, but I was concerned this may cause confusion with the existing TraceId type.

Since a simple "TraceId" is already taken, are there any suggestions for naming here?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if this were a java builder, I would just say something like TraceId.Builder.parseTraceId(String) or similar, which internally sets one of the two. Maybe you could consider making a constructor of TraceId which takes a string form of TraceId and internally does the same?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that the naming is now confusing at best, but don't think its something that should be addressed in this PR. If we want to take the easy route out of this, how about SpanId128. Might want to rename the method as well, since it seems (at least to me, happy to hear why I'm wrong) that we're really dealing in span-ids.

try {
val length = spanId.length()
val lower64Bits = if (length <= 16) spanId else spanId.substring(length - 16)
val upper64bits = if (length == 32) Some(spanId.substring(0, 16)) else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use the if / else block to form the whole SpanId instead of just the substring? That would make the match below unnecessary.

} else {
val span64 = ByteArrays.get64be(bytes, 0)
val parent64 = ByteArrays.get64be(bytes, 8)
val trace64 = ByteArrays.get64be(bytes, 16)
val flags64 = ByteArrays.get64be(bytes, 24)

val traceIdHigh = if(bytes.length == 40) ByteArrays.get64be(bytes, 32) else 0L
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not Option[SpanId] instead of Long? Looks like your putting a Some(SpanId(0L)) into the TraceId.apply below due to this (same in Trace.scala). Is this the only situations where you would expect to find a SpanId of 0? If so, lets do _traceIdHigh -> traceIdHigh and remove the method: I know that case class is pretty hairy at this point but I'd prefer the generated unapply method to yield as accurate results as possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I did it this way with the special cased 0L, is that systems requiring 128bit traces may be padded with zeros. On my team at Expedia, we hacked the 64bit trace ids to fit our tracing system by transforming the traces generated by Finagle into a UUID with zeroed most significant bits (example: new java.util.UUID(0,traceId)). If the special case is dropped here and not cleaned up via the traceId method, I assumed that would cause problems with allowing a traceIdHigh equal to 0L.

I may have misunderstood what @adriancole said with 0L being invalid. To clarify, my understanding was that both of these are invalid:
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 7fff ffff ffff ffff

Is the second case a valid 128bit identifier? If it is valid, then I agree we should just take the bits as they come and forward them along.

It's a little different from the tracing system I'll be integrating with, in which the identifiers are string representations of UUIDs with no constraint on the high 64bits (see haystack-idl).

@@ -179,6 +223,12 @@ final case class TraceId(
case Some(id) => id
}

def traceIdHigh: Option[SpanId] = _traceIdHigh match {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See comment above about removing this and just using the constructor param.

@@ -143,9 +148,9 @@ object Trace {
val spanId = SpanId(rng.nextLong())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not an expert, and this isn't part of this change, but I recall a comment by @adriancole saying that a span-id of 0 is not valid, but it's possible that this rng.nextLong() could give us a 0. Is that only true in the high bits?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ThreadLocalRandom.current().nextLong(1,Long.MaxValue) could provide non-zero positive values, but as I understood it all longs are valid except 0.

Copy link
Contributor

@codefromthecrypt codefromthecrypt Oct 4, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

traceIdHigh is invalid as zero for sure (though using the aws-convertable scheme I mentioned, it is impossible to end up with a zero traceIdHigh unless you are tracing 1970 :) ).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

one thing to bear in mind is that negative is needed or else you won't get full random. if trying to avoid 0, you can loop while nextLong == 0. This is unlikely to loop twice

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For simplicity, I'd vote for looping while nextLong == 0. Another alternative, but one that is probably more work and not worth it, is

var spanId = ThreadLocalRandom.current().nextLong(Long.MinValue, Long.MaxValue - 1)
if (0l <= spanId) {
  spanId += 1
}

FWIW, it's still not clear to me whether 0 is even illegal.

@@ -143,9 +148,9 @@ object Trace {
val spanId = SpanId(rng.nextLong())
idOption match {
case Some(id) =>
TraceId(Some(id.traceId), Some(id.spanId), spanId, id.sampled, id.flags)
TraceId(Some(id.traceId), Some(id.spanId), spanId, id.sampled, id.flags, id.traceIdHigh)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

An aside: this is so close to being id.copy(spanId = spanId) but I'm worried about the subtle differences in def traceId etc.

case None =>
TraceId(None, None, spanId, None, Flags())
TraceId(None, None, spanId, None, Flags(), None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also an aside, but I'm surprised we don't have a TraceId.Empty instance laying about.

@@ -476,6 +478,7 @@ private[finagle] object Message {
// TODO: technically we should probably check for duplicate
// keys, but for now, just pick the latest one.
key match {
// TODO: support traceIdHigh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Treq is dead to us: its only used for backward compatibility so I don't think a TODO is necessary. A note that we're not going to support 128-bit trace id's in Treq is enough. If someone upgrades, which they'd need to do to support the 128-bit trace id's anyway, they are going to be using Tdispatch/Rdispatch which should get 128-bit via this PR.

Someone should correct me if this is not right.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

@@ -39,6 +39,7 @@ private[finagle] class RichRequestHeader(val header: RequestHeader) extends AnyV
if (header.isSetParent_span_id) Some(SpanId(header.getParent_span_id)) else None,
SpanId(header.getSpan_id),
if (header.isSetSampled) Some(header.isSampled) else None,
if (header.isSetFlags) Flags(header.getFlags) else Flags()
if (header.isSetFlags) Flags(header.getFlags) else Flags(),
None // TODO: Regenerate tracing.thrift, and update to support high bits in 128 TraceID
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we do this as part of this PR? Outside of Twitter, thrift is the most heavily used RPC protocol, so it would be a shame to not have it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@bryce-anderson I commented on this in the discussion part of the PR, but it's probably gotten buried. I tried using thrift 0.9 and 0.10 (both installed via homebrew) to regenerate tracing.thrift using the commands at the top of the file, and these resulted in code that wouldn't compile. There were numerous references to apache libs which weren't resolved. Do you know if there have been manual modifications to the generated files, or if you all are using a modified version of thrift?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We use old thrift, libthrift-0.5. ☹️

Copy link
Contributor

@nepthar nepthar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks so much for working on this! The approach looks good to me.

* span, while TraceIds contain a span ID as well as context (parentId
* and traceId).
*/
* Defines trace identifiers. Span IDs name a particular (unique)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could you please revert this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah man. I had copy/pasted the entirety of SpanId to overwrite a previous commit in which I attempted to extend all spans to 128-bit. My IDE settings auto-formatted to these changes on save, and I didn't catch it. I also didn't catch the differing indent on new methods like apply below. I'll fix these up.

Do you all have IDE-specific formatting settings shared for IntelliJ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah sounds like a pain :/

I'm not too familiar with intellij, but I did some poking around and I found that we have some kind of Twitter scheme for code style here and that I could export it as XML. This is a totally unofficial solution and I'm not gonna guarantee that it will actually format properly, but you're welcome to give it a shot: https://pastebin.com/ZL0UkQ98.

@@ -6,6 +6,7 @@ import com.twitter.finagle.util.ByteArrays
import com.twitter.io.Buf
import com.twitter.util._
import java.net.InetSocketAddress

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

please revert

if (body.length != 32)
return Throw(new IllegalArgumentException("Expected 32 bytes"))
// Allows for 64-bit or 128-bit trace identifiers
if (body.length != 32 && body.length != 40)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO, this is OK for now. I tried to get some kind of "state of the union" view of where tracing is heading and came up with nothing conclusive in 10 minutes - we'd have to do some serious digging to answer this question, I think.

@@ -476,6 +478,7 @@ private[finagle] object Message {
// TODO: technically we should probably check for duplicate
// keys, but for now, just pick the latest one.
key match {
// TODO: support traceIdHigh
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed.

ByteArrays.put64be(bytes, 8, traceId.parentId.toLong)
ByteArrays.put64be(bytes, 16, traceId.traceId.toLong)
val bytes = new Array[Byte](40)
ByteArrays.put64be(bytes, 0, traceId.spanId.self)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest leaving it as toLong for clarity

bytes
}

/**
* Deserialize a TraceId from an array of bytes.
*/
def deserialize(bytes: Array[Byte]): Try[TraceId] = {
if (bytes.length != 32) {
Throw(new IllegalArgumentException("Expected 32 bytes"))
// Allows for 64-bit or 128-bit trace identifiers
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this would be nice in the scaladoc instead of a comment

@codefromthecrypt
Copy link
Contributor

PS there's a small change we can consider when generating tracedIdHigh, to support conversion to amazon x-ray trace format. If we did this, we could provision traces that can tunnel through ELBs, fir example.

  static long nextTraceIdHigh(Random prng) {
    long epochSeconds = System.currentTimeMillis() / 1000;
    int random = prng.nextInt();
    return (epochSeconds & 0xffffffffL) << 32
        |  (random & 0xffffffffL);
  }

See openzipkin/zipkin#1754

@codefromthecrypt
Copy link
Contributor

FYI there was at some point a question about other formats. census (dapper, grpc, etc) is piloting a format which may be accepted as a standard (named trace-context), or changed to suit that. It is too early to consider, but anyway it is largely compatible with what is going on with the trace context. https://github.com/census-instrumentation/opencensus-specs/blob/master/encodings/BinaryEncoding.md

I think of this as a tactical, low-risk change to facilitate 128-bit IDs, and not necessarily a change the world, strategic one. Expect some time next year a plan for the latter, and of course participate between now and then if you can! https://github.com/TraceContext/tracecontext-spec

@CLAassistant
Copy link

CLAassistant commented Oct 16, 2017

CLA assistant check
All committers have signed the CLA.

* Style cleanup
* Cleaner construction of 128 bit trace id
* Remove and comment on lack of legacy trace id support in TReq
* Loop on spanId=0 in Trace#nextId
@jimschubert jimschubert force-pushed the tracing-128-bit-spans branch from 6d9531b to 6e884e1 Compare October 16, 2017 02:06
Defaulting ot 32bytes on wire when 64bit TraceId prevents
forcing upgrades to support the optional 128bit TraceId.

This reverts a previous change to the test which assumed a hard binary
message update to 40 bytes.
@jimschubert
Copy link
Contributor Author

I think I hit all the pieces of feedback in this except:

  • Adrian's comment about nextTraceIdHigh: it wasn't clear if this was an ask to add to this PR or an aside for future consideration
  • Bryce's comments related to thrift (regenerating and updating RichRequestHeader.scala): I spent some time working on getting thrift 0.5 compiled locally, but stopped after I felt like I was spinning my wheels resolving old dependency matches.

The PR is blocked on thrift/mux changes since I can't generate tracing.thrift into valid code. I could modify the generated code by hand, but I'd rather not do that and introduce bugs.

@codefromthecrypt
Copy link
Contributor

fyi I think nextTraceIdHigh could easily be done now, and probably should as a lot of sites are running in AWS and this helps make sure we can interop. Is there a technical problem implementing? If so, I can help more than what I pasted.

@codefromthecrypt
Copy link
Contributor

probably part of my last comment, but I can't see anywhere where we generate a 128-bit trace ID? Looks like this change will propagate, but not create 128-bit IDs. Am I missing something?

Either here or very soon, we should allow generation of new traces in http with 128-bit trace IDs, possibly based on a flag. When we generate 128-bit IDs, that would be the code that eventually exercises nextTraceIdHigh.

@jimschubert
Copy link
Contributor Author

@adriancole I don't see any technical reason to not generate traceIdHigh now, especially since current Finagle truncates to a single long if it encounters a 128bit trace id.

Do you think this is behavior that should be behind a flag?

I may not be able to get back to that addition for a week, though. So if you want to assist I'd be fine with that as well :D. I think this PR will be blocked until I can work out generating the tracing.thrift changes so 128bit Trace ID support can also be added to thrift logic.

@codefromthecrypt
Copy link
Contributor

Forgive old-school, but here's a patch I think works!

diff --git a/finagle-core/src/main/scala/com/twitter/finagle/tracing/Trace.scala b/finagle-core/src/main/scala/com/twitter/finagle/tracing/Trace.scala
index bf3287a22..370978cba 100644
--- a/finagle-core/src/main/scala/com/twitter/finagle/tracing/Trace.scala
+++ b/finagle-core/src/main/scala/com/twitter/finagle/tracing/Trace.scala
@@ -6,8 +6,14 @@ import com.twitter.finagle.util.ByteArrays
 import com.twitter.io.Buf
 import com.twitter.util._
 import java.net.InetSocketAddress
+
+import com.twitter.app.GlobalFlag
+
 import scala.util.Random
 
+object traceId128Bit extends GlobalFlag(false, "When true, new root spans will have 128-bit trace IDs. Defaults to false (64-bit).")
+
+
 /**
  * This is a tracing system similar to Dapper:
  *
@@ -92,7 +98,8 @@ object Trace {
   }
 
   private[this] val rng = new Random
-  private[this] val defaultId = TraceId(None, None, SpanId(rng.nextLong()), None, Flags(), None)
+  private[this] val defaultId =
+    TraceId(None, None, SpanId(rng.nextLong()), None, Flags(), if (traceId128Bit()) Some(nextTraceIdHigh()) else None)
   @volatile private[this] var tracingEnabled = true
 
   private[this] val EmptyTraceCtxFn = () => TraceCtx.empty
@@ -154,8 +161,10 @@ object Trace {
     idOption match {
       case Some(id) =>
         TraceId(Some(id.traceId), Some(id.spanId), spanId, id.sampled, id.flags, id.traceIdHigh)
-      case None =>
-        TraceId(None, None, spanId, None, Flags(), None)
+      case None => {
+        val traceIdHigh = if (traceId128Bit()) Some(nextTraceIdHigh()) else None
+        TraceId(None, None, spanId, None, Flags(), traceIdHigh)
+      }
     }
   }
 
@@ -377,4 +386,19 @@ object Trace {
       }
     }
   }
+
+  /**
+   * Some tracing systems such as Amazon X-Ray encode the orginal timestamp in
+   * order enable even partitions in the backend. As sampling only occurs on
+   * low 64-bits anyway, we encode epoch seconds into high-bits to support
+   * downstreams who have a timestamp requirement.
+   *
+   * The 128-bit trace ID (composed of high/low) composes to the following:
+   * |---- 32 bits for epoch seconds --- | ---- 96 bits for random number --- |
+   */
+  def nextTraceIdHigh(): SpanId = {
+    val epochSeconds = Time.now.sinceEpoch.inSeconds
+    val random = rng.nextInt()
+    SpanId((epochSeconds & 0xffffffffL) << 32 | (random & 0xffffffffL))
+  }
 }
diff --git a/finagle-core/src/test/scala/com/twitter/finagle/tracing/TraceTest.scala b/finagle-core/src/test/scala/com/twitter/finagle/tracing/TraceTest.scala
index 6484f4dae..cf9b8b131 100644
--- a/finagle-core/src/test/scala/com/twitter/finagle/tracing/TraceTest.scala
+++ b/finagle-core/src/test/scala/com/twitter/finagle/tracing/TraceTest.scala
@@ -401,4 +401,12 @@ class TraceTest extends FunSuite with MockitoSugar with BeforeAndAfter with OneI
       case rv => fail(s"Got $rv")
     }
   }
+
+  // example from X-Amzn-Trace-Id: Root=1-5759e988-bd862e3fe1be46a994272793;Sampled=1
+  test("Trace.nextTraceIdHigh: encodes epoch seconds") {
+    Time.withTimeAt(Time.fromSeconds(1465510280)) { tc => // Thursday, June 9, 2016 10:11:20 PM
+      val traceIdHigh = Trace.nextTraceIdHigh()
+      assert(traceIdHigh.toString.startsWith("5759e988")) == true
+    }
+  

@jimschubert
Copy link
Contributor Author

@adriancole thanks for that patch. I've had a lot less free time lately than I anticipated.

I finally worked out a docker image for Thrift 0.5 (https://hub.docker.com/r/jimschubert/thrift/), and I was able to regenerate compilable code from tracing.thrift. I've added a test for the thriftmux server + Finagle thrift client to validate 128bit TraceID propagation. Please let me know if I missed anything.

btw, sorry for the multiple commits and build failures. When I try to run the entire test suite locally, I run out of heap space so I'm relying on the CI outputs. (✪㉨✪)

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Oct 27, 2017 via email

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Oct 27, 2017 via email

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Oct 27, 2017 via email

* develop:
  finagle: Fixed some typos
  Twitter OSS: Post-release
  finagle-core: Fix race in ProcessCoordinate
  Twitter-oss: Prepare OSS libraries for release
  finagle, finatra, scrooge, twitter-server, util: Move to YY.MM.x versioning
@jimschubert
Copy link
Contributor Author

@adriancole thanks for the clarification. I've made that last change, and I think that addresses all the feedback.

@@ -58,18 +59,53 @@ object SpanId {
}
}

case class TraceId128(low: Option[SpanId], high: Option[SpanId])
object TraceId128 {
val empty = TraceId128(None, None)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: include the type : TraceId128 for public members

Copy link
Contributor

@kevinoliver kevinoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

lgtm. thanks for your patience and the code documentation as well.

i left some very superficial comments.

@@ -26,7 +26,7 @@ object SpanId {
val s = "%02x".format(bb)
Array(s(0), s(1))
}
).toArray
).toArray
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: is this a stray whitespace addition?

Copy link
Contributor Author

@jimschubert jimschubert Oct 31, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah man, this IntelliJ setting (as well as format on save) keeps getting reset when I reopen IntelliJ and the change has snuck past me a few times now.

I'll edit the whitespace changes in vim. Thanks for pointing it out. Sorry about that.

ByteArrays.put64be(bytes, 0, traceId.spanId.toLong)
ByteArrays.put64be(bytes, 8, traceId.parentId.toLong)
ByteArrays.put64be(bytes, 16, traceId.traceId.toLong)
ByteArrays.put64be(bytes, 24, flags.toLong)
if(traceId.traceIdHigh.isDefined) ByteArrays.put64be(bytes, 32, traceId.traceIdHigh.get.toLong)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

another nit: space after if

if (bytes.length != 32) {
Throw(new IllegalArgumentException("Expected 32 bytes"))
if (bytes.length != 32 && bytes.length != 40) {
Throw(new IllegalArgumentException("Expected 32 or 40 bytes"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

can you include bytes.length in the failure? it can be helpful for debugging. something like:

Throw(new IllegalArgumentException("Expected 32 or 40 bytes, was: " + bytes.length))

} else {
val span64 = ByteArrays.get64be(bytes, 0)
val parent64 = ByteArrays.get64be(bytes, 8)
val trace64 = ByteArrays.get64be(bytes, 16)
val flags64 = ByteArrays.get64be(bytes, 24)

val traceIdHigh = if(bytes.length == 40) Some(SpanId(ByteArrays.get64be(bytes, 32))) else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same nit re space after if

@@ -205,5 +249,5 @@ final case class TraceId(
override def hashCode(): Int =
ids.hashCode()

override def toString = s"$traceId.$spanId<:$parentId"
override def toString = s"${if(traceIdHigh.isEmpty) "" else traceIdHigh.get}$traceId.$spanId<:$parentId"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

same nit with whitespace after if

var nextLong = rng.nextLong()
if (nextLong == 0L) {
// NOTE: spanId of 0 is invalid, so guard against that here.
do nextLong = rng.nextLong() while (nextLong == 0L)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

feel free to ignore, but i think you can tighten it up:

var nextLong = rng.nextLong()
while (nextLong == 0L) {
  nextLong = rng.nextLong()
}

Copy link
Contributor

@bryce-anderson bryce-anderson left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM after @kevinoliver's suggestions.

Copy link
Contributor

@nepthar nepthar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything LGTM w/ kevins fixes + the formatting thing.

@@ -70,6 +75,8 @@ object Trace {
val trace64 = ByteArrays.get64be(bytes, 16)
val flags64 = ByteArrays.get64be(bytes, 24)

val traceIdHigh = if(body.length == 40) Some(SpanId(ByteArrays.get64be(bytes, 32))) else None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit - space after if

Copy link
Contributor

@kevinoliver kevinoliver left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks again!

@jcrossley
Copy link
Contributor

@jimschubert thanks! I'm going to pull this in internally and will let you know once it's merged.

@jimschubert
Copy link
Contributor Author

@jcrossley sounds good. I appreciate it!

@jcrossley
Copy link
Contributor

@jimschubert This has been merged internally and will go out with the next release. Thank you for the contribution!

@jimschubert
Copy link
Contributor Author

@jcrossley that's awesome. Thanks!

@codefromthecrypt
Copy link
Contributor

oops.. we missed a spot.

in TraceInfo.scala, we forgot to serialize the trace id header properly. When you enable traceId128bit, the below doesn't propagate the high bits (sending only half the trace ID)

  def setClientRequestHeaders(request: Request): Unit = {
    removeAllHeaders(request.headerMap)

    val traceId = Trace.id
    request.headerMap.add(Header.TraceId, traceId.traceId.toString)
    request.headerMap.add(Header.SpanId, traceId.spanId.toString)

I think it should be something like

  // if traceIdHigh
    request.headerMap.add(Header.TraceId, traceId.traceIdHigh.toString + traceId.traceId.toString)

@jimschubert
Copy link
Contributor Author

@adriancole @jcrossley since this has been merged internally already, what would be the process to add this last piece? Should I open another PR, or can this change just be made internally?

@codefromthecrypt
Copy link
Contributor

codefromthecrypt commented Nov 20, 2017 via email

finaglehelper pushed a commit that referenced this pull request Nov 29, 2017
…uests

Summary: Problem

When `com.twitter.finagle.tracing.traceId128Bit=true` outbound requests don't
include the entire trace ID, rather only the lower 64 bits. This can result in
broken traces, depending on if Zipkin is set to operate tolerantly.

Solution

This was due to a missed change where we missed setting the trace ID header
according to `traceIdHigh`. This fixes that and backfills tests

Result

`X-B3-TraceId` values should be 32 lowerhex length when
`com.twitter.finagle.tracing.traceId128Bit=true` and a new trace is sent
downstream

See #651

Signed-off-by: Isabel Martin <imartin@twitter.com>

Differential Revision: https://phabricator.twitter.biz/D115342
@mosesn mosesn closed this Jan 11, 2018
@jimschubert
Copy link
Contributor Author

Mentioning openzipkin/b3-propagation#6 here so it's connected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

Successfully merging this pull request may close these issues.

8 participants