-
Notifications
You must be signed in to change notification settings - Fork 3.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Block checksum validation and "influx_inspect verify" tool #5582
Conversation
tstore.EngineOptions.Config.Dir = filepath.Join(path, "data") | ||
tstore.EngineOptions.Config.WALLoggingEnabled = false | ||
tstore.EngineOptions.Config.WALDir = filepath.Join(path, "wal") | ||
if err := tstore.Open(); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This tool really shouldn't be loading the store and shards. It's meant to be a separate and independent implementation of lower level TSM functions. Calling Open
has some side-effects like kicking off compactions, cleanup, etc.. which makes it unsafe to run outside of the server.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Note taken, I will just open the file manually and parse the bytes in based on the TSM structure. Is that what you mean?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes. Checksum is the first 4 bytes of each block.
@seiflotfy #5502 was about validating the value we have here: https://github.com/influxdata/influxdb/blob/master/cmd/influx_inspect/tsm.go#L547 since we just print it. Would you be able to re-work this so that it does not use the |
I am on it 👍 |
So I updated the code to now "walk" through the data directory getting all the .tsm files. It hurts a bit to do that but I found no other way around it. The new output looks as follows:
|
func (t *TSMReader) ValidateChecksum() map[string]map[string]bool { | ||
checksumMap := make(map[string]map[string]bool) | ||
for _, key := range t.Keys() { | ||
checksums, _ := t.accessor.getChecksums(key) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think you need to get a RLock
on t.accessor
here?
LGTM 👍 |
@jwilder is this good to go then? |
@@ -99,6 +100,7 @@ type blockAccessor interface { | |||
readBytes(entry *IndexEntry, buf []byte) ([]byte, error) | |||
path() string | |||
close() error | |||
getChecksums(key string) (map[string]bool, error) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you rename this to just checksums(...)
?
Ok, going to purge this and start all over I guess :) |
You should be able to just call:
|
that is what I am doing On Thu, Feb 11, 2016 at 3:32 PM Jason Wilder notifications@github.com
|
@seiflotfy Sorry for the delays on this. If you could give this a final rebase, I'll try to get it merged in. A quick glance says it should be pretty quick. Thanks! |
I pushed by mistake (its late here). Don't merge yet since i need to test it out properly |
@seiflotfy no problem! i'll check back in later. |
it works but its SLOW. Do not merge, I will try to speed it up |
@toddboom @benbjohnson I added a cache to speed things up. Please have a look at the changes to writer.go and tell me if it makes sense? |
mu sync.RWMutex | ||
size uint32 | ||
blocks map[string]*indexEntries | ||
cachedSortedKeys []string |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This will double the size of the index in memory when writing TSM files during compactions. Can you revert this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sure
On Tue, Apr 19, 2016 at 7:59 AM Jason Wilder notifications@github.com
wrote:
In tsdb/engine/tsm1/writer.go
#5582 (comment):@@ -264,9 +264,10 @@ func NewDirectIndex() TSMIndex {
// directIndex is a simple in-memory index implementation for a TSM file. The full index
// must fit in memory.
type directIndex struct {
- mu sync.RWMutex
- size uint32
- blocks map[string]*indexEntries
- mu sync.RWMutex
- size uint32
- blocks map[string]*indexEntries
- cachedSortedKeys []string
This will double the size of the index in memory when writing TSM files
during compactions. Can you revert this?—
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/influxdata/influxdb/pull/5582/files/ff72a3debddbae9be9a92d59f66cc3bc6394e66c#r60176562
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think there needs to be a way to not have to resort the keys when calling Keys() every time. what do you think?
@jwilder done |
👍 Looks like another rebase is needed. |
done On Tue, Apr 19, 2016 at 10:18 PM Jason Wilder notifications@github.com
|
Thanks @seiflotfy! |
Fixes #5502
Output is in the form of: