Pull requests and contributions are more than welcome. Here are a list of things that would be great to have done. These tasks could be a good fit for, for instance, undergrads interesting in getting into NLP. For more detail on any of these tasks, please feel free to email me.
- Enable multi-GPU training and prediction. There's a tutorial on how to do this.
- Re-factor and comment the modeling code. Basically all of the modeling is accomplished by enumerating spans, and then running them through unary or binary scoring functions. Because this was written as research code, a lot of functionality is duplicated. Re-factoring could make the code much easier to extend. If interested, email me and I'll provide more info.
- Clean up the documentation. The documentation could be more organized and concise. In particular, I don't do a great job explaining how to use a pretrained model to make predictons on a new dataset.
- Enable multi-namespace prediction. Right now, when using a pretrained model to make predictons on a new dataset, the user specifies which label namespace the model should use to make predictions by setting the
dataset
field in the new dataset. Ideally, the user should be able to request predictions for multiple different label namespaces as a flag toallennlp predict
, for instance. For more information on label namespaces see the information on multi-dataset training. - Enable training with batch sizes other than 1. See the final "Problem" in batching and batch size for more information on why this would be helpful.