-
Notifications
You must be signed in to change notification settings - Fork 1.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Enable propagation of 128-bit TraceID #651
Conversation
Allows 128-bit TraceID to be consumed while defaulting to generating 64-bit Spans. Adds GlobalFlag c.t.f.tracing.generate128BitSpanIds to enable the generation of 128-bit SpanIDs.
The trace ID has cause to consider 128bit, but parent and span ID not. It is not likely at all to get a clash within a trace. Since neither B3 nor its successor TraceContext are on a track to have 128 bit span IDs, I would pull it from this change. For example, if you make span ID 128 bit, it will break and restart a trace on most things I am aware of (like istio, grpc/census and everything in zipkin) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for starting this!
|
||
val flags = Flags(flags64) | ||
val sampled = if (flags.isFlagSet(Flags.SamplingKnown)) { | ||
if (flags.isFlagSet(Flags.Sampled)) someTrue else someFalse | ||
} else None | ||
|
||
val traceId = TraceId( | ||
if (trace64 == parent64) None else Some(SpanId(trace64)), | ||
if (trace64 == parent64) None else Some(SpanId(traceIdHigh, trace64)), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For compat maybe better to add another spanid for high bits vs expand spanid to 128
This adds a new span, traceIdHigh, to TraceId to support the high bits of a 128bit TraceID rather than truncating to 64bits. This does not include generation of 128bit TraceID.
@adriancole I updated this PR to include an additional span to account for the high bits, rather than extending SpanId support to 128bits. This is definitely a bit cleaner. I had some concerns and TODO items.
TODO items:
|
Hi, Jim. Great progress
Summary is I think tmux can be separate as there are http-only finagle
users. Treat high absent/zero as the same.. ideally you cannot construct
Some(0). That should simplify interop and also decisions.
@mosesn other thoughts?
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
nice work
ByteArrays.put64be(bytes, 8, traceId.parentId.toLong) | ||
ByteArrays.put64be(bytes, 16, traceId.traceId.toLong) | ||
val bytes = new Array[Byte](40) | ||
ByteArrays.put64be(bytes, 0, traceId.spanId.self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I switched over to my open IntelliJ to start looking at a different area of Finagle, unrelated to this PR, and this file was open. I remembered that these were previously toLong
, and here I have changed them to self
. I wasn't even thinking about the change when I did this, I was just directly referencing the property rather than the helper method . Should I go ahead and change these (in Id.scala, IdTest.scala) back to toLong
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest leaving it as toLong for clarity
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks pretty good, thanks for the PR! Most of my suggestions are asides as I've not really dug into this code before.
ByteArrays.put64be(bytes, 0, traceId.spanId.toLong) | ||
ByteArrays.put64be(bytes, 8, traceId.parentId.toLong) | ||
ByteArrays.put64be(bytes, 16, traceId.traceId.toLong) | ||
val bytes = new Array[Byte](40) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we gate the size of the array to 32 if !traceId.traceIdHigh.isDefined
. Otherwise, even if services are using the 64-bit trace-ids, they will need to have received this patch in order to support tracing.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had thought about doing this as well, but I wasn't sure if which would be the correct approach. I'll make the change.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In rest of zipkin we treat lack of traceIDHigh as unset. There's no binary format at the moment, and at some point there will be. However, here I think it is important to remain in "compat mode" with 32byte context when there's no traceIdHigh (or it is zero). This allows folks to upgrade more slowly.
* | ||
* @return An <code>(Option[SpanId], Option[SpanId])</code> of 64bit or 128bit TraceID. Structure is (high, low). | ||
*/ | ||
def mk128BitTraceId(spanId: String): (Option[SpanId], Option[SpanId]) = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we make a named case class to return these SpanId
's instead of using a Tuple2
? I would have expected the low bytes to come first, but it's just a state of mind thing so giving it a name would make it unambiguous.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks. I meant to mark that as another question (conventions for creating one-off case classes like this) but forgot to mention it in the PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Was just getting around to this, and remembered a major part of the question. I wanted to call that case class just "TraceId", but this is already taken from the case class encapsulating the traceId, parentId, spanId, etc. In other implementations, I think I've seen the current TraceId
named things like TraceContext or SpanContext. I had considered a case class namedFullTraceId
, but I was concerned this may cause confusion with the existing TraceId
type.
Since a simple "TraceId" is already taken, are there any suggestions for naming here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if this were a java builder, I would just say something like TraceId.Builder.parseTraceId(String)
or similar, which internally sets one of the two. Maybe you could consider making a constructor of TraceId which takes a string form of TraceId and internally does the same?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I agree that the naming is now confusing at best, but don't think its something that should be addressed in this PR. If we want to take the easy route out of this, how about SpanId128
. Might want to rename the method as well, since it seems (at least to me, happy to hear why I'm wrong) that we're really dealing in span-ids.
try { | ||
val length = spanId.length() | ||
val lower64Bits = if (length <= 16) spanId else spanId.substring(length - 16) | ||
val upper64bits = if (length == 32) Some(spanId.substring(0, 16)) else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you use the if / else
block to form the whole SpanId
instead of just the substring? That would make the match below unnecessary.
} else { | ||
val span64 = ByteArrays.get64be(bytes, 0) | ||
val parent64 = ByteArrays.get64be(bytes, 8) | ||
val trace64 = ByteArrays.get64be(bytes, 16) | ||
val flags64 = ByteArrays.get64be(bytes, 24) | ||
|
||
val traceIdHigh = if(bytes.length == 40) ByteArrays.get64be(bytes, 32) else 0L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why not Option[SpanId]
instead of Long
? Looks like your putting a Some(SpanId(0L))
into the TraceId.apply
below due to this (same in Trace.scala). Is this the only situations where you would expect to find a SpanId
of 0? If so, lets do _traceIdHigh -> traceIdHigh
and remove the method: I know that case class is pretty hairy at this point but I'd prefer the generated unapply
method to yield as accurate results as possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason I did it this way with the special cased 0L, is that systems requiring 128bit traces may be padded with zeros. On my team at Expedia, we hacked the 64bit trace ids to fit our tracing system by transforming the traces generated by Finagle into a UUID with zeroed most significant bits (example: new java.util.UUID(0,traceId)
). If the special case is dropped here and not cleaned up via the traceId
method, I assumed that would cause problems with allowing a traceIdHigh equal to 0L.
I may have misunderstood what @adriancole said with 0L being invalid. To clarify, my understanding was that both of these are invalid:
0000 0000 0000 0000 0000 0000 0000 0000
0000 0000 0000 0000 7fff ffff ffff ffff
Is the second case a valid 128bit identifier? If it is valid, then I agree we should just take the bits as they come and forward them along.
It's a little different from the tracing system I'll be integrating with, in which the identifiers are string representations of UUIDs with no constraint on the high 64bits (see haystack-idl).
@@ -179,6 +223,12 @@ final case class TraceId( | |||
case Some(id) => id | |||
} | |||
|
|||
def traceIdHigh: Option[SpanId] = _traceIdHigh match { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See comment above about removing this and just using the constructor param.
@@ -143,9 +148,9 @@ object Trace { | |||
val spanId = SpanId(rng.nextLong()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not an expert, and this isn't part of this change, but I recall a comment by @adriancole saying that a span-id of 0 is not valid, but it's possible that this rng.nextLong()
could give us a 0. Is that only true in the high bits?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ThreadLocalRandom.current().nextLong(1,Long.MaxValue)
could provide non-zero positive values, but as I understood it all longs are valid except 0.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
traceIdHigh is invalid as zero for sure (though using the aws-convertable scheme I mentioned, it is impossible to end up with a zero traceIdHigh unless you are tracing 1970 :) ).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
one thing to bear in mind is that negative is needed or else you won't get full random. if trying to avoid 0, you can loop while nextLong == 0. This is unlikely to loop twice
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For simplicity, I'd vote for looping while nextLong == 0
. Another alternative, but one that is probably more work and not worth it, is
var spanId = ThreadLocalRandom.current().nextLong(Long.MinValue, Long.MaxValue - 1)
if (0l <= spanId) {
spanId += 1
}
FWIW, it's still not clear to me whether 0 is even illegal.
@@ -143,9 +148,9 @@ object Trace { | |||
val spanId = SpanId(rng.nextLong()) | |||
idOption match { | |||
case Some(id) => | |||
TraceId(Some(id.traceId), Some(id.spanId), spanId, id.sampled, id.flags) | |||
TraceId(Some(id.traceId), Some(id.spanId), spanId, id.sampled, id.flags, id.traceIdHigh) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
An aside: this is so close to being id.copy(spanId = spanId)
but I'm worried about the subtle differences in def traceId
etc.
case None => | ||
TraceId(None, None, spanId, None, Flags()) | ||
TraceId(None, None, spanId, None, Flags(), None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also an aside, but I'm surprised we don't have a TraceId.Empty
instance laying about.
@@ -476,6 +478,7 @@ private[finagle] object Message { | |||
// TODO: technically we should probably check for duplicate | |||
// keys, but for now, just pick the latest one. | |||
key match { | |||
// TODO: support traceIdHigh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Treq
is dead to us: its only used for backward compatibility so I don't think a TODO is necessary. A note that we're not going to support 128-bit trace id's in Treq
is enough. If someone upgrades, which they'd need to do to support the 128-bit trace id's anyway, they are going to be using Tdispatch/Rdispatch
which should get 128-bit via this PR.
Someone should correct me if this is not right.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
@@ -39,6 +39,7 @@ private[finagle] class RichRequestHeader(val header: RequestHeader) extends AnyV | |||
if (header.isSetParent_span_id) Some(SpanId(header.getParent_span_id)) else None, | |||
SpanId(header.getSpan_id), | |||
if (header.isSetSampled) Some(header.isSampled) else None, | |||
if (header.isSetFlags) Flags(header.getFlags) else Flags() | |||
if (header.isSetFlags) Flags(header.getFlags) else Flags(), | |||
None // TODO: Regenerate tracing.thrift, and update to support high bits in 128 TraceID |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we do this as part of this PR? Outside of Twitter, thrift is the most heavily used RPC protocol, so it would be a shame to not have it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@bryce-anderson I commented on this in the discussion part of the PR, but it's probably gotten buried. I tried using thrift 0.9 and 0.10 (both installed via homebrew) to regenerate tracing.thrift using the commands at the top of the file, and these resulted in code that wouldn't compile. There were numerous references to apache libs which weren't resolved. Do you know if there have been manual modifications to the generated files, or if you all are using a modified version of thrift?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We use old thrift, libthrift-0.5.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks so much for working on this! The approach looks good to me.
* span, while TraceIds contain a span ID as well as context (parentId | ||
* and traceId). | ||
*/ | ||
* Defines trace identifiers. Span IDs name a particular (unique) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could you please revert this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah man. I had copy/pasted the entirety of SpanId to overwrite a previous commit in which I attempted to extend all spans to 128-bit. My IDE settings auto-formatted to these changes on save, and I didn't catch it. I also didn't catch the differing indent on new methods like apply
below. I'll fix these up.
Do you all have IDE-specific formatting settings shared for IntelliJ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Ah sounds like a pain :/
I'm not too familiar with intellij, but I did some poking around and I found that we have some kind of Twitter scheme for code style here and that I could export it as XML. This is a totally unofficial solution and I'm not gonna guarantee that it will actually format properly, but you're welcome to give it a shot: https://pastebin.com/ZL0UkQ98.
@@ -6,6 +6,7 @@ import com.twitter.finagle.util.ByteArrays | |||
import com.twitter.io.Buf | |||
import com.twitter.util._ | |||
import java.net.InetSocketAddress | |||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please revert
if (body.length != 32) | ||
return Throw(new IllegalArgumentException("Expected 32 bytes")) | ||
// Allows for 64-bit or 128-bit trace identifiers | ||
if (body.length != 32 && body.length != 40) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IMO, this is OK for now. I tried to get some kind of "state of the union" view of where tracing is heading and came up with nothing conclusive in 10 minutes - we'd have to do some serious digging to answer this question, I think.
@@ -476,6 +478,7 @@ private[finagle] object Message { | |||
// TODO: technically we should probably check for duplicate | |||
// keys, but for now, just pick the latest one. | |||
key match { | |||
// TODO: support traceIdHigh |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agreed.
ByteArrays.put64be(bytes, 8, traceId.parentId.toLong) | ||
ByteArrays.put64be(bytes, 16, traceId.traceId.toLong) | ||
val bytes = new Array[Byte](40) | ||
ByteArrays.put64be(bytes, 0, traceId.spanId.self) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suggest leaving it as toLong for clarity
bytes | ||
} | ||
|
||
/** | ||
* Deserialize a TraceId from an array of bytes. | ||
*/ | ||
def deserialize(bytes: Array[Byte]): Try[TraceId] = { | ||
if (bytes.length != 32) { | ||
Throw(new IllegalArgumentException("Expected 32 bytes")) | ||
// Allows for 64-bit or 128-bit trace identifiers |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this would be nice in the scaladoc instead of a comment
PS there's a small change we can consider when generating tracedIdHigh, to support conversion to amazon x-ray trace format. If we did this, we could provision traces that can tunnel through ELBs, fir example. static long nextTraceIdHigh(Random prng) {
long epochSeconds = System.currentTimeMillis() / 1000;
int random = prng.nextInt();
return (epochSeconds & 0xffffffffL) << 32
| (random & 0xffffffffL);
} |
FYI there was at some point a question about other formats. census (dapper, grpc, etc) is piloting a format which may be accepted as a standard (named trace-context), or changed to suit that. It is too early to consider, but anyway it is largely compatible with what is going on with the trace context. https://github.com/census-instrumentation/opencensus-specs/blob/master/encodings/BinaryEncoding.md I think of this as a tactical, low-risk change to facilitate 128-bit IDs, and not necessarily a change the world, strategic one. Expect some time next year a plan for the latter, and of course participate between now and then if you can! https://github.com/TraceContext/tracecontext-spec |
* Style cleanup * Cleaner construction of 128 bit trace id * Remove and comment on lack of legacy trace id support in TReq * Loop on spanId=0 in Trace#nextId
6d9531b
to
6e884e1
Compare
Defaulting ot 32bytes on wire when 64bit TraceId prevents forcing upgrades to support the optional 128bit TraceId. This reverts a previous change to the test which assumed a hard binary message update to 40 bytes.
I think I hit all the pieces of feedback in this except:
The PR is blocked on thrift/mux changes since I can't generate |
fyi I think nextTraceIdHigh could easily be done now, and probably should as a lot of sites are running in AWS and this helps make sure we can interop. Is there a technical problem implementing? If so, I can help more than what I pasted. |
probably part of my last comment, but I can't see anywhere where we generate a 128-bit trace ID? Looks like this change will propagate, but not create 128-bit IDs. Am I missing something? Either here or very soon, we should allow generation of new traces in http with 128-bit trace IDs, possibly based on a flag. When we generate 128-bit IDs, that would be the code that eventually exercises |
@adriancole I don't see any technical reason to not generate traceIdHigh now, especially since current Finagle truncates to a single long if it encounters a 128bit trace id. Do you think this is behavior that should be behind a flag? I may not be able to get back to that addition for a week, though. So if you want to assist I'd be fine with that as well :D. I think this PR will be blocked until I can work out generating the tracing.thrift changes so 128bit Trace ID support can also be added to thrift logic. |
Forgive old-school, but here's a patch I think works! diff --git a/finagle-core/src/main/scala/com/twitter/finagle/tracing/Trace.scala b/finagle-core/src/main/scala/com/twitter/finagle/tracing/Trace.scala
index bf3287a22..370978cba 100644
--- a/finagle-core/src/main/scala/com/twitter/finagle/tracing/Trace.scala
+++ b/finagle-core/src/main/scala/com/twitter/finagle/tracing/Trace.scala
@@ -6,8 +6,14 @@ import com.twitter.finagle.util.ByteArrays
import com.twitter.io.Buf
import com.twitter.util._
import java.net.InetSocketAddress
+
+import com.twitter.app.GlobalFlag
+
import scala.util.Random
+object traceId128Bit extends GlobalFlag(false, "When true, new root spans will have 128-bit trace IDs. Defaults to false (64-bit).")
+
+
/**
* This is a tracing system similar to Dapper:
*
@@ -92,7 +98,8 @@ object Trace {
}
private[this] val rng = new Random
- private[this] val defaultId = TraceId(None, None, SpanId(rng.nextLong()), None, Flags(), None)
+ private[this] val defaultId =
+ TraceId(None, None, SpanId(rng.nextLong()), None, Flags(), if (traceId128Bit()) Some(nextTraceIdHigh()) else None)
@volatile private[this] var tracingEnabled = true
private[this] val EmptyTraceCtxFn = () => TraceCtx.empty
@@ -154,8 +161,10 @@ object Trace {
idOption match {
case Some(id) =>
TraceId(Some(id.traceId), Some(id.spanId), spanId, id.sampled, id.flags, id.traceIdHigh)
- case None =>
- TraceId(None, None, spanId, None, Flags(), None)
+ case None => {
+ val traceIdHigh = if (traceId128Bit()) Some(nextTraceIdHigh()) else None
+ TraceId(None, None, spanId, None, Flags(), traceIdHigh)
+ }
}
}
@@ -377,4 +386,19 @@ object Trace {
}
}
}
+
+ /**
+ * Some tracing systems such as Amazon X-Ray encode the orginal timestamp in
+ * order enable even partitions in the backend. As sampling only occurs on
+ * low 64-bits anyway, we encode epoch seconds into high-bits to support
+ * downstreams who have a timestamp requirement.
+ *
+ * The 128-bit trace ID (composed of high/low) composes to the following:
+ * |---- 32 bits for epoch seconds --- | ---- 96 bits for random number --- |
+ */
+ def nextTraceIdHigh(): SpanId = {
+ val epochSeconds = Time.now.sinceEpoch.inSeconds
+ val random = rng.nextInt()
+ SpanId((epochSeconds & 0xffffffffL) << 32 | (random & 0xffffffffL))
+ }
}
diff --git a/finagle-core/src/test/scala/com/twitter/finagle/tracing/TraceTest.scala b/finagle-core/src/test/scala/com/twitter/finagle/tracing/TraceTest.scala
index 6484f4dae..cf9b8b131 100644
--- a/finagle-core/src/test/scala/com/twitter/finagle/tracing/TraceTest.scala
+++ b/finagle-core/src/test/scala/com/twitter/finagle/tracing/TraceTest.scala
@@ -401,4 +401,12 @@ class TraceTest extends FunSuite with MockitoSugar with BeforeAndAfter with OneI
case rv => fail(s"Got $rv")
}
}
+
+ // example from X-Amzn-Trace-Id: Root=1-5759e988-bd862e3fe1be46a994272793;Sampled=1
+ test("Trace.nextTraceIdHigh: encodes epoch seconds") {
+ Time.withTimeAt(Time.fromSeconds(1465510280)) { tc => // Thursday, June 9, 2016 10:11:20 PM
+ val traceIdHigh = Trace.nextTraceIdHigh()
+ assert(traceIdHigh.toString.startsWith("5759e988")) == true
+ }
+ |
@adriancole thanks for that patch. I've had a lot less free time lately than I anticipated. I finally worked out a docker image for Thrift 0.5 (https://hub.docker.com/r/jimschubert/thrift/), and I was able to regenerate compilable code from tracing.thrift. I've added a test for the thriftmux server + Finagle thrift client to validate 128bit TraceID propagation. Please let me know if I missed anything. btw, sorry for the multiple commits and build failures. When I try to run the entire test suite locally, I run out of heap space so I'm relying on the CI outputs. (✪㉨✪) |
Nice work Jim.
Nothing on my side at the moment.. all good. @llinder this when released
means linkerd can do 128bit trace ids (potentially interop with aws.. we
can discuss offline this part)
|
Oops I meant @klingperf
|
If 0L isn't converted to None via the getter in Trace, this will allow
propagation of invalid 128 bit TraceIDs (such as a stringified long which
is left-padded with 0s). The getter protects against this, with the effect
of reducing the span to 64 bit. Is that not acceptable, or do you think it
should propagate ids with invalid high bits?
I hadn't changed this because the guard on 0L works similarly to the
previous implementation which reduce all 128bit TraceIDs to 64bit, and I
was waiting for a response to my previous comment before changing.
the TL;DR; is that a trace ID high of zero is the same as a 64-bit ID and
should be serialized as a 64-bit one. I understand that it is possible that
someone sent a incoming trace with traceIDHigh as zero, but that's very
unlikely as this binary marshaling approach is finagle-specific. In other
words, the only way a 40byte thing will have a 0L traceIdHigh is some
clone, which would have to mis-implement what we do here.
|
* develop: finagle: Fixed some typos Twitter OSS: Post-release finagle-core: Fix race in ProcessCoordinate Twitter-oss: Prepare OSS libraries for release finagle, finatra, scrooge, twitter-server, util: Move to YY.MM.x versioning
@adriancole thanks for the clarification. I've made that last change, and I think that addresses all the feedback. |
@@ -58,18 +59,53 @@ object SpanId { | |||
} | |||
} | |||
|
|||
case class TraceId128(low: Option[SpanId], high: Option[SpanId]) | |||
object TraceId128 { | |||
val empty = TraceId128(None, None) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: include the type : TraceId128
for public members
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
lgtm. thanks for your patience and the code documentation as well.
i left some very superficial comments.
@@ -26,7 +26,7 @@ object SpanId { | |||
val s = "%02x".format(bb) | |||
Array(s(0), s(1)) | |||
} | |||
).toArray | |||
).toArray |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: is this a stray whitespace addition?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah man, this IntelliJ setting (as well as format on save) keeps getting reset when I reopen IntelliJ and the change has snuck past me a few times now.
I'll edit the whitespace changes in vim. Thanks for pointing it out. Sorry about that.
ByteArrays.put64be(bytes, 0, traceId.spanId.toLong) | ||
ByteArrays.put64be(bytes, 8, traceId.parentId.toLong) | ||
ByteArrays.put64be(bytes, 16, traceId.traceId.toLong) | ||
ByteArrays.put64be(bytes, 24, flags.toLong) | ||
if(traceId.traceIdHigh.isDefined) ByteArrays.put64be(bytes, 32, traceId.traceIdHigh.get.toLong) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another nit: space after if
if (bytes.length != 32) { | ||
Throw(new IllegalArgumentException("Expected 32 bytes")) | ||
if (bytes.length != 32 && bytes.length != 40) { | ||
Throw(new IllegalArgumentException("Expected 32 or 40 bytes")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you include bytes.length
in the failure? it can be helpful for debugging. something like:
Throw(new IllegalArgumentException("Expected 32 or 40 bytes, was: " + bytes.length))
} else { | ||
val span64 = ByteArrays.get64be(bytes, 0) | ||
val parent64 = ByteArrays.get64be(bytes, 8) | ||
val trace64 = ByteArrays.get64be(bytes, 16) | ||
val flags64 = ByteArrays.get64be(bytes, 24) | ||
|
||
val traceIdHigh = if(bytes.length == 40) Some(SpanId(ByteArrays.get64be(bytes, 32))) else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same nit re space after if
@@ -205,5 +249,5 @@ final case class TraceId( | |||
override def hashCode(): Int = | |||
ids.hashCode() | |||
|
|||
override def toString = s"$traceId.$spanId<:$parentId" | |||
override def toString = s"${if(traceIdHigh.isEmpty) "" else traceIdHigh.get}$traceId.$spanId<:$parentId" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same nit with whitespace after if
var nextLong = rng.nextLong() | ||
if (nextLong == 0L) { | ||
// NOTE: spanId of 0 is invalid, so guard against that here. | ||
do nextLong = rng.nextLong() while (nextLong == 0L) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
feel free to ignore, but i think you can tighten it up:
var nextLong = rng.nextLong()
while (nextLong == 0L) {
nextLong = rng.nextLong()
}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM after @kevinoliver's suggestions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Everything LGTM w/ kevins fixes + the formatting thing.
@@ -70,6 +75,8 @@ object Trace { | |||
val trace64 = ByteArrays.get64be(bytes, 16) | |||
val flags64 = ByteArrays.get64be(bytes, 24) | |||
|
|||
val traceIdHigh = if(body.length == 40) Some(SpanId(ByteArrays.get64be(bytes, 32))) else None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit - space after if
…eption now includes bytes.length
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks again!
@jimschubert thanks! I'm going to pull this in internally and will let you know once it's merged. |
@jcrossley sounds good. I appreciate it! |
@jimschubert This has been merged internally and will go out with the next release. Thank you for the contribution! |
@jcrossley that's awesome. Thanks! |
oops.. we missed a spot. in
I think it should be something like
|
@adriancole @jcrossley since this has been merged internally already, what would be the process to add this last piece? Should I open another PR, or can this change just be made internally? |
@adriancole <https://github.com/adriancole> @jcrossley
<https://github.com/jcrossley> since this has been merged internally
already, what would be the process to add this last piece? Should I open
another PR, or can this change just be made internally?
I just made one here #663
|
…uests Summary: Problem When `com.twitter.finagle.tracing.traceId128Bit=true` outbound requests don't include the entire trace ID, rather only the lower 64 bits. This can result in broken traces, depending on if Zipkin is set to operate tolerantly. Solution This was due to a missed change where we missed setting the trace ID header according to `traceIdHigh`. This fixes that and backfills tests Result `X-B3-TraceId` values should be 32 lowerhex length when `com.twitter.finagle.tracing.traceId128Bit=true` and a new trace is sent downstream See #651 Signed-off-by: Isabel Martin <imartin@twitter.com> Differential Revision: https://phabricator.twitter.biz/D115342
Mentioning openzipkin/b3-propagation#6 here so it's connected. |
Problem
See #564
Finagle currently only "tolerates" 128-bit TraceID by dropping the high 64-bits.
My target tracing system uses all 128-bit ids (trace, parent, and span) generated from UUIDs, so support for propagating 128-bit TraceID is preferable.
Solution
Allows 128-bit TraceID to be consumed and propagated through the stack while defaulting to generating 64-bit TraceIDs.
Result
Finagle tracing continues to also support 64-bit trace identifiers. The upper 64-bit of a 128-bit TraceID is included on the wire format as below:
Here if the upper 8 bytes are set, they're treated as the upper 64-bit part of a TraceID.
Breaking: TraceId serialization becomes 40 bytes on the wire.QuestionI started down the path of generating 128-bit spanId and parentId, but I'm not sure of a good way to support this inTrace#tryUnmarshal
so I decided to open the PR for feedback. Is this supported elsewhere? I've read some discussion that says TraceID makes sense in high traffic environments, while spans are generally fewer and unlikely to cause collisions. I can remove the flag, but wanted to gather thoughts first.