-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow configurable elasticsearch index date format #840
Allow configurable elasticsearch index date format #840
Conversation
Signed-off-by: Pavel Nikolov <pavel.nikolov@fairfaxmedia.com.au>
overall looks good, but a couple of comments:
|
|
The cleaner script is here https://github.com/jaegertracing/jaeger/blob/master/plugin/storage/es/esCleaner.py |
Do you think that the dependencies job would be affected as well? https://github.com/jaegertracing/spark-dependencies/blob/master/jaeger-spark-dependencies-elasticsearch/src/main/java/io/jaegertracing/spark/dependencies/elastic/ElasticsearchDependenciesJob.java#L151 |
@@ -76,6 +77,7 @@ type Span struct { | |||
// NewSpanWriter creates a new SpanWriter for use | |||
func NewSpanWriter( | |||
client es.Client, | |||
dateFormat string, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since we're breaking the API, let's use a struct instead
type SpanWriterOptions struct {
dateFormat string
numShards int64
numReplicas int64
}
func NewSpanWriter(
client es.Client,
logger *zap.Logger,
metricsFactory metrics.Factory,
options SpanWriterOptions) {
}
It will be easier to extend if we want to make index name configurable
|
||
dateFormat string | ||
|
||
expectedSpanIndexName string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank lines not needed
Yes, Spark job will be affected as well. But it would have to be a different PR. |
@pavelnikolov there has been some discussion in #799 re configurable index name prefixes, which is related to the changes being made here (your changes so far focus on date format, but the changes need to be made in the same 3~ places and coordinated). @yurishkuro has suggested his preference would be to make both sets of changes together to save doing the coordinated changes twice. @pavelnikolov can you take a look over #799 and see if you feel up to making the suggested changes in this PR? I am happy to help/provide input if you need also, was planning to PR separately otherwise :) |
I’ll check soon. I’ve been too busy lately with other things.
…Sent from my phone
On 30 May 2018, at 14:21, Nathan Sullivan ***@***.***> wrote:
@pavelnikolov there has been some discussion in #799 re configurable index name prefixes, which is related to the changes being made here (your changes so far focus on date format, but the changes need to be made in the same 3~ places and coordinated). @yurishkuro has suggested his preference would be to make both sets of changes together to save doing the coordinated changes twice.
@pavelnikolov can you take a look over #799 and see if you feel up to making the suggested changes in this PR? I am happy to help/provide input if you need also, was planning to PR separately otherwise :)
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub, or mute the thread.
|
I don't think we should go with this approach - configurable date format option. The biggest downside is the limitation on retention configuration:
It's also harder to configure and ES cleaner would not look nice either - re date format in python is different. I think we should go with truncation ( The reader would use the // Returns the array of indices that we need to query, based on query params
func (s *SpanReader) indicesForTimeRange(indexName string, startTime time.Time, endTime time.Time) []string {
var indices []string
firstIndex := indexWithDate(indexName, startTime, s.dateFormat)
currentIndex := indexWithDate(indexName, endTime, s.dateFormat)
for currentIndex != firstIndex {
indices = append(indices, currentIndex)
// use trunc time
endTime = endTime.Add(-s.truncTime * time.Hour)
currentIndex = indexWithDate(indexName, endTime, s.dateFormat)
}
return append(indices, firstIndex)
}
jaegerIndices := s.indicesForTimeRange(s.serviceIndexPrefix, currentTime.Add(-s.maxSpanAge), currentTime) |
cc @jaegertracing/elasticsearch |
Was there a specific reason someone asked specifically for date format rather than truncation? |
from #628 (comment)
I don't know how @pavelnikolov deletes indices. but IIRC curator should be format agnostic the creation date is retrieved from index properties. |
As long as the effect is the same: a span is written to the expected index, the means likely do not matter as long as the span's original timestamp is retained.
At least in later versions of elasticsearch-curator package, this is correct. We are actually moving away from jaeger's time-based index naming entirely and choosing to use the Elasticsearch Rollover API as it scales more predictably for us. Example: 25GB per-shard, automatically, which is ideal for our write/read workloads (both human, and automated anomaly detection)
At least in all of our usage, this largely doens't make sense in both our high-usage and low-usage indexes. The Elasticsearch Rollover API is a far better solution, from Elasticsearch's perspective, than sharding on time-based indices. |
Rollover API looks pretty interesting, thanks for pointing it out. Perhaps it can also be used to avoid iterating through indices at query time. |
I think the reader would have to use a different index - alias https://www.elastic.co/guide/en/elasticsearch/reference/current/indices-aliases.html. Once the write index is rolled over the old index has to be moved to read alias. Perhaps we could use this to manage retention in archive index? |
@pavelnikolov I will close this PR for two reasons:
|
Which problem is this PR solving?
Short description of the changes
SPAN_STORAGE_TYPE=elasticsearch
: