pretrained-clinical-embeddings

Resourses of pre-trained language models on clinical texts.

Pre-trained models

As of July 8, 2019, the following models have been made available:

ELMo
- ELMo at 20K steps
- ELMo at 100K steps
- ELMo at 200K steps
- ELMo at 300K steps
Each .tar.gz file contains two items: a .json file of pre-training architecture and a .hdf5 file of pre-trained weights.
BERT
- Large Cased Models
  - at 20K steps
  - at 100K steps
  - at 200K steps
  - at 300K steps
- Base Cased Models
  - at 20K steps
  - at 100K steps
  - at 200K steps
  - at 300K steps
Each .tar.gz file contains a TensorFlow checkpoint (model.ckpt.*) including the pre-trained weights (which is actually 3 files). We followed the authors' detailed instructions to set up the pre-training parameters therefore the pre-training architecture files bert_config.json are the same with released BERT models respectively.

The vocabulary list (vocab.txt) released by Google Team consisting of 28,996 word-pieced tokens is also adopted.

Acknowledgments

We are grateful to the authors of BERT and ELMo to make the pre-training codes and instructions publicly available. We are also thankful to the MIMIC-III team for providing valuable resources about clinical text. Please follow the instructions to get the access of MIMIC-III data before downloading the above pre-trained models.

Citation

If you use models available in this repository, we would be grateful if you would cite the paper as follows:

Si, Yuqi, Jingqi Wang, Hua Xu, and Kirk Roberts. 2019. “Enhancing Clinical Concept Extraction with Contextual Embeddings.” Journal of the American Medical Informatics Association, July, ocz096. https://doi.org/10.1093/jamia/ocz096.

@article{si_enhancing_2019,
	title = {Enhancing clinical concept extraction with contextual embeddings},
	issn = {1527-974X},
	url = {https://academic.oup.com/jamia/advance-article/doi/10.1093/jamia/ocz096/5527248},
	doi = {10.1093/jamia/ocz096},
	language = {en},
	urldate = {2019-07-09},
	journal = {Journal of the American Medical Informatics Association},
	author = {Si, Yuqi and Wang, Jingqi and Xu, Hua and Roberts, Kirk},
	month = jul,
	year = {2019},
	pages = {ocz096}
}

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

pretrained-clinical-embeddings

Pre-trained models

Acknowledgments

Citation

About

Releases

Packages

License

krobertslab/pretrained-clinical-embeddings

Folders and files

Latest commit

History

Repository files navigation

pretrained-clinical-embeddings

Pre-trained models

Acknowledgments

Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages