Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Block checksum validation and "influx_inspect verify" tool #5582

Merged
merged 2 commits into from
Apr 19, 2016
Merged

Add Block checksum validation and "influx_inspect verify" tool #5582

merged 2 commits into from
Apr 19, 2016

Conversation

seiflotfy
Copy link
Contributor

Fixes #5502

Output is in the form of:

[Broken Block] ShardID: 3    File: /Users/seif/.influxdb/data/stress/default/3/000000001-000000001.tsm   Key: cpu,host=server-23136,location=us-west#!~#value    Block: min=2006-01-02 00:00:10 +0000 UTC max=2006-01-02 00:04:10 +0000 UTC ofs=1220899 siz=82
[Broken Block] ShardID: 3    File: /Users/seif/.influxdb/data/stress/default/3/000000001-000000001.tsm   Key: cpu,host=server-48876,location=us-west#!~#value    Block: min=2006-01-02 00:00:10 +0000 UTC max=2006-01-02 00:04:10 +0000 UTC ofs=3612460 siz=82
Broken Blocks: 2 / 100000

tstore.EngineOptions.Config.Dir = filepath.Join(path, "data")
tstore.EngineOptions.Config.WALLoggingEnabled = false
tstore.EngineOptions.Config.WALDir = filepath.Join(path, "wal")
if err := tstore.Open(); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This tool really shouldn't be loading the store and shards. It's meant to be a separate and independent implementation of lower level TSM functions. Calling Open has some side-effects like kicking off compactions, cleanup, etc.. which makes it unsafe to run outside of the server.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Note taken, I will just open the file manually and parse the bytes in based on the TSM structure. Is that what you mean?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. Checksum is the first 4 bytes of each block.

@jwilder
Copy link
Contributor

jwilder commented Feb 8, 2016

@seiflotfy #5502 was about validating the value we have here: https://github.com/influxdata/influxdb/blob/master/cmd/influx_inspect/tsm.go#L547 since we just print it.

Would you be able to re-work this so that it does not use the tsdb.Store and tsdb.Shard? I think having a separate verify command is a good idea as opposed to tacking on another column into the dumptsm1dev command.

@seiflotfy
Copy link
Contributor Author

I am on it 👍

@seiflotfy
Copy link
Contributor Author

So I updated the code to now "walk" through the data directory getting all the .tsm files. It hurts a bit to do that but I found no other way around it.

The new output looks as follows:

[Broken Block] in File: /Users/seif/.influxdb/data/stress/default/3/000000001-000000001.tsm  Key: cpu,host=server-61547,location=us-west#!~#value   Block: min=2006-01-02 00:00:10 +0000 UTC max=2006-01-02 00:04:10 +0000 UTC ofs=4790017 siz=84
[Broken Block] in File: /Users/seif/.influxdb/data/stress/default/3/000000001-000000001.tsm  Key: cpu,host=server-77729,location=us-west#!~#value   Block: min=2006-01-02 00:00:10 +0000 UTC max=2006-01-02 00:04:00 +0000 UTC ofs=6288168 siz=79
Broken Blocks: 2 / 100003

func (t *TSMReader) ValidateChecksum() map[string]map[string]bool {
checksumMap := make(map[string]map[string]bool)
for _, key := range t.Keys() {
checksums, _ := t.accessor.getChecksums(key)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you need to get a RLock on t.accessor here?

@e-dard
Copy link
Contributor

e-dard commented Feb 9, 2016

LGTM 👍

@seiflotfy
Copy link
Contributor Author

@jwilder is this good to go then?

@@ -99,6 +100,7 @@ type blockAccessor interface {
readBytes(entry *IndexEntry, buf []byte) ([]byte, error)
path() string
close() error
getChecksums(key string) (map[string]bool, error)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you rename this to just checksums(...)?

@seiflotfy
Copy link
Contributor Author

Ok, going to purge this and start all over I guess :)

@jwilder
Copy link
Contributor

jwilder commented Feb 11, 2016

You should be able to just call:

TSMReader.BlockIterator() which will return an iterator that will allow you to walk over every block in a file sequentially. The Read call strips the checksum out currently so the underlying bits would need to be modified to return that instead of stripping it. fileAccessor and mmapAccessor should only need a small change to readBytes

@seiflotfy
Copy link
Contributor Author

that is what I am doing

On Thu, Feb 11, 2016 at 3:32 PM Jason Wilder notifications@github.com
wrote:

You should be able to just call:

TSMReader.BlockIterator() which will return an iterator that will allow
you to walk over every block in a file sequentially. The Read call strips
the checksum out currently so the underlying bits would need to be modified
to return that instead of stripping it. fileAccessor and mmapAccessor
should only need a small change to readBytes


Reply to this email directly or view it on GitHub
#5582 (comment).

@toddboom
Copy link
Contributor

@seiflotfy Sorry for the delays on this. If you could give this a final rebase, I'll try to get it merged in. A quick glance says it should be pretty quick. Thanks!

@seiflotfy
Copy link
Contributor Author

I pushed by mistake (its late here). Don't merge yet since i need to test it out properly

@toddboom
Copy link
Contributor

@seiflotfy no problem! i'll check back in later.

@seiflotfy
Copy link
Contributor Author

it works but its SLOW. Do not merge, I will try to speed it up

@seiflotfy
Copy link
Contributor Author

@toddboom @benbjohnson I added a cache to speed things up. Please have a look at the changes to writer.go and tell me if it makes sense?

mu sync.RWMutex
size uint32
blocks map[string]*indexEntries
cachedSortedKeys []string
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will double the size of the index in memory when writing TSM files during compactions. Can you revert this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sure

On Tue, Apr 19, 2016 at 7:59 AM Jason Wilder notifications@github.com
wrote:

In tsdb/engine/tsm1/writer.go
#5582 (comment):

@@ -264,9 +264,10 @@ func NewDirectIndex() TSMIndex {
// directIndex is a simple in-memory index implementation for a TSM file. The full index
// must fit in memory.
type directIndex struct {

  • mu sync.RWMutex
  • size uint32
  • blocks map[string]*indexEntries
  • mu sync.RWMutex
  • size uint32
  • blocks map[string]*indexEntries
  • cachedSortedKeys []string

This will double the size of the index in memory when writing TSM files
during compactions. Can you revert this?


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
https://github.com/influxdata/influxdb/pull/5582/files/ff72a3debddbae9be9a92d59f66cc3bc6394e66c#r60176562

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there needs to be a way to not have to resort the keys when calling Keys() every time. what do you think?

@seiflotfy
Copy link
Contributor Author

@jwilder done

@seiflotfy
Copy link
Contributor Author

@toddboom @jwilder its much faster now... thanks. Don't know what the appveyor test is though

@jwilder
Copy link
Contributor

jwilder commented Apr 19, 2016

👍 Looks like another rebase is needed.

@seiflotfy
Copy link
Contributor Author

done

On Tue, Apr 19, 2016 at 10:18 PM Jason Wilder notifications@github.com
wrote:

👍 Looks like another rebase is needed.


You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub
#5582 (comment)

@jwilder jwilder merged commit 7ca9992 into influxdata:master Apr 19, 2016
@jwilder
Copy link
Contributor

jwilder commented Apr 19, 2016

Thanks @seiflotfy!

@jwilder jwilder added this to the 0.13.0 milestone Apr 19, 2016
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants