Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

sh build_index.sh #14

Open
followcxy opened this issue Jan 12, 2022 · 2 comments
Open

sh build_index.sh #14

followcxy opened this issue Jan 12, 2022 · 2 comments

Comments

@followcxy
Copy link

我在做build_index这一步时报错:
Traceback (most recent call last):
File "build_index.py", line 114, in
main(args)
File "build_index.py", line 85, in main
used_data, used_ids, max_norm = get_features(args.batch_size, args.norm_th, vocab, model, used_data, used_ids, max_norm_cf=args.max_norm_cf)
File "/root/miniconda3/lib/python3.8/site-packages/torch/autograd/grad_mode.py", line 26, in decorate_context
return func(*args, **kwargs)
File "/root/autodl-tmp/TM/retriever.py", line 342, in get_features
cur_vecs = model(batch, batch_first=True).detach().cpu().numpy()
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 161, in forward
outputs = self.parallel_apply(replicas, inputs, kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 171, in parallel_apply
return parallel_apply(replicas, inputs, kwargs, self.device_ids[:len(replicas)])
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 86, in parallel_apply
output.reraise()
File "/root/miniconda3/lib/python3.8/site-packages/torch/_utils.py", line 428, in reraise
raise self.exc_type(msg)
TypeError: Caught TypeError in replica 1 on device 1.
Original Traceback (most recent call last):
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/parallel/parallel_apply.py", line 61, in _worker
output = module(*input, **kwargs)
File "/root/miniconda3/lib/python3.8/site-packages/torch/nn/modules/module.py", line 727, in _call_impl
result = self.forward(*input, **kwargs)
TypeError: forward() missing 1 required positional argument: 'input_ids'
看起来像是多卡运行有问题,但是我看到代码中已经指定了用0卡去运行,不知道这里是出了什么问题,能帮忙看一下么

@followcxy
Copy link
Author

如果添加os.environ["CUDA_VISIBLE_DEVICES"]='0',或者使用只有一个卡的机器运行,会有下面这个错误:
Traceback (most recent call last):
File "build_index.py", line 115, in
main(args)
File "build_index.py", line 90, in main
mips.train(used_data)
File "/root/autodl-tmp/TM/mips.py", line 43, in train
self.index.train(data)
File "/root/miniconda3/lib/python3.8/site-packages/faiss/init.py", line 144, in replacement_train
self.train_c(n, swig_ptr(x))
File "/root/miniconda3/lib/python3.8/site-packages/faiss/swigfaiss.py", line 4134, in train
return _swigfaiss.GpuIndexIVFScalarQuantizer_train(self, n, x)
RuntimeError: Error in virtual void faiss::Clustering::train(faiss::Clustering::idx_t, const float*, faiss::Index&) at Clustering.cpp:82: Error: 'nx >= k' failed: Number of training points (1) should be at least as large as number of clusters (1024)
希望可以帮我一下,谢谢您

@rangehow
Copy link

点开看下你的数据是不是因为换行或者什么原因只有一条,这个bug我记得以前有人提过。这个faiss索引类型有最少的数据要求才能训练。

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants