SecBERT
is a BERT
model trained on cyber security text, learned CyberSecurity Knowledge.
-
SecBERT
is trained on papers from the corpus of -
SecBERT
has its own vocabulary (secvocab
) that's built to best match the training corpus. We trained SecBERT and SecRoBERTa versions.
SecBERT models now installable directly within Huggingface's framework:
from transformers import AutoTokenizer, AutoModelForMaskedLM
tokenizer = AutoTokenizer.from_pretrained("jackaduma/SecBERT")
model = AutoModelForMaskedLM.from_pretrained("jackaduma/SecBERT")
tokenizer = AutoTokenizer.from_pretrained("jackaduma/SecRoBERTa")
model = AutoModelForMaskedLM.from_pretrained("jackaduma/SecRoBERTa")
We release the the pytorch version of the trained models. The pytorch version is created using the Hugging Face library, and this repo shows how to use it.
SecBERT models include all necessary files to be plugged in your own model and are in same format as BERT.
If you use PyTorch, refer to Hugging Face's repo where detailed instructions on using BERT models are provided.
We proposed to build language model which work on cyber security text, as result, it can improve downstream tasks (NER, Text Classification, Semantic Understand, Q&A) in Cyber Security Domain.
First, as below shows Fill-Mask pipeline in Google Bert, AllenAI SciBert and our SecBERT .
cd lm
python eval_fillmask_lm.py
If this project help you reduce time to develop, you can give me a cup of coffee :)
AliPay(支付宝)
WechatPay(微信)
MIT © Kun