Skip to content
This repository has been archived by the owner on Jul 24, 2024. It is now read-only.

lightning panic ( runtime error: invalid memory address or nil point dereference) #1213

Closed
fubinzh opened this issue Jun 15, 2021 · 3 comments
Assignees

Comments

@fubinzh
Copy link

fubinzh commented Jun 15, 2021

Please answer these questions before submitting your issue. Thanks!

  1. What did you do?
    If possible, provide a recipe for reproducing the error.

  2. Use tidb-lightning to import data to TiDB. (4 lightning node, each node imports 2.5 TB data.)

  3. What did you expect to see?

Lightning import should be successful.

  1. What did you see instead?

Lightning import failed with below errors

lightning_panic

  1. What version of BR and TiDB/TiKV/PD are you using?

tidb-lightning:v5.1.0-20210608

  1. Operation logs
    • Please upload br.log for BR if possible
    • Please upload tidb-lightning.log for TiDB-Lightning if possible
    • Please upload tikv-importer.log from TiKV-Importer if possible
    • Other interesting logs

Logs can be found here: http://minio.pingcap.net:9000/minio/nfs/10TB_Testing/table31/

  1. Configuration of the cluster and the task

    • tidb-lightning.toml for TiDB-Lightning if possible
    • tikv-importer.toml for TiKV-Importer if possible
    • topology.yml if deployed by TiUP
  2. Screenshot/exported-PDF of Grafana dashboard or metrics' graph in Prometheus if possible

@sleepymole
Copy link
Contributor

sleepymole commented Jun 21, 2021

The root cause may be related to kv.CommonHandle.Next().

// Next implements the Handle interface.
func (ch *CommonHandle) Next() Handle {
	return &CommonHandle{
		encoded:       Key(ch.encoded).PrefixNext(),
		colEndOffsets: ch.colEndOffsets,
	}
}
func (k Key) PrefixNext() Key {
	buf := make([]byte, len(k))
	copy(buf, k)
	var i int
	for i = len(k) - 1; i >= 0; i-- {
		buf[i]++
		if buf[i] != 0 {
			break
		}
	}
	if i == -1 {
		copy(buf, k)
		buf = append(buf, 0)
	}
	return buf
}

The next handle of a CommonHandle is probably not a valid handle, because the Next implementation just add one to the encoded data. I'm not sure whether it's by design or just a bug.

For a row key, nextKey will decode the key first and then fetch the next valid row key.

if tablecodec.IsRecordKey(key) {
tableID, handle, _ := tablecodec.DecodeRecordKey(key)
return tablecodec.EncodeRowKeyWithHandle(tableID, handle.Next())
}

Lightning calls nextKey first in

func (local *local) readAndSplitIntoRange(engineFile *File) ([]Range, error) {
iter := engineFile.db.NewIter(&pebble.IterOptions{LowerBound: normalIterStartKey})
defer iter.Close()
iterError := func(e string) error {
err := iter.Error()
if err != nil {
return errors.Annotate(err, e)
}
return errors.New(e)
}
var firstKey, lastKey []byte
if iter.First() {
firstKey = append([]byte{}, iter.Key()...)
} else {
return nil, iterError("could not find first pair")
}
if iter.Last() {
lastKey = append([]byte{}, iter.Key()...)
} else {
return nil, iterError("could not find last pair")
}
endKey := nextKey(lastKey)
.
Lightning probably calls nextKey for the same key again in
if !hasKey {
log.L().Info("There is no pairs in iterator",
logutil.Key("start", start),
logutil.Key("end", end),
logutil.Key("next end", nextKey(end)))
engineFile.finishedRanges.add(Range{start: start, end: end})
return nil
}
.
But the handle is invalid, so nextKey cannot decode the key and finally panicked.

@sleepymole
Copy link
Contributor

/cc @glorv

@sleepymole
Copy link
Contributor

Fixed by #1261.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

3 participants