diff --git a/docs/_posts/ahmedlone127/2024-07-01-mpnet_base_token_classifier_en.md b/docs/_posts/ahmedlone127/2024-07-01-mpnet_base_token_classifier_en.md new file mode 100644 index 00000000000000..b8bee8923f97bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-07-01-mpnet_base_token_classifier_en.md @@ -0,0 +1,91 @@ +--- +layout: model +title: MPnetForTokenClassification Base Model English +author: John Snow Labs +name: mpnet_base_token_classifier +date: 2024-07-01 +tags: [token_classification, mpnet, ner, en, open_source, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MPNetForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetForTokenClassification, fine tuned in huggingface in house and then imported to Spark-NLP o provide scalability and production-readiness. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mpnet_base_token_classifier_en_5.4.0_3.0_1719843589238.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mpnet_base_token_classifier_en_5.4.0_3.0_1719843589238.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = MPNetForTokenClassification.pretrained("mpnet_base_token_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = MPNetForTokenClassification.pretrained("mpnet_base_token_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mpnet_base_token_classifier| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[token, document]| +|Output Labels:|[label]| +|Language:|en| +|Size:|395.9 MB| +|Case sensitive:|true| \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-07-03-mistral_7b_en.md b/docs/_posts/ahmedlone127/2024-07-03-mistral_7b_en.md new file mode 100644 index 00000000000000..e6ce7cf9ed4cfc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-07-03-mistral_7b_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Mistral text-to-text model 7b int8 +author: John Snow Labs +name: mistral_7b +date: 2024-07-03 +tags: [mistral, en, llm, open_source, openvino] +task: Text Generation +language: en +edition: Spark NLP 5.4.0 +spark_version: 3.0 +supported: true +engine: openvino +annotator: MistralTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MistralTransformer, adapted and imported into Spark NLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mistral_7b_en_5.4.0_3.0_1720021606199.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mistral_7b_en_5.4.0_3.0_1720021606199.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +mistral = MistralTransformer .pretrained() \ + .setMaxOutputLength(50) \ + .setDoSample(False) \ + .setInputCols(["document"]) \ + .setOutputCol("mistral_generation") + +pipeline = Pipeline().setStages([documentAssembler, mistral]) +data = spark.createDataFrame([["Who is the founder of Spark-NLP?"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val mistral = MistralTransformer .pretrained() + .setMaxOutputLength(50) + .setDoSample(False) + .setInputCols(["document"]) + .setOutputCol("mistral_generation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, mistral)) +val data = Seq("Who is the founder of Spark-NLP?").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mistral_7b| +|Compatibility:|Spark NLP 5.4.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|6.6 GB|