diff --git a/docs/_posts/ahmedlone127/2024-09-05-distilbert_base_uncased_finetuned_yelp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-05-distilbert_base_uncased_finetuned_yelp_pipeline_en.md new file mode 100644 index 00000000000000..34137fdde2b597 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-05-distilbert_base_uncased_finetuned_yelp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_yelp_pipeline pipeline DistilBertForSequenceClassification from vinhanguyen +author: John Snow Labs +name: distilbert_base_uncased_finetuned_yelp_pipeline +date: 2024-09-05 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_yelp_pipeline` is a English model originally trained by vinhanguyen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_yelp_pipeline_en_5.5.0_3.0_1725579974379.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_yelp_pipeline_en_5.5.0_3.0_1725579974379.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_yelp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_yelp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_yelp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/vinhanguyen/distilbert-base-uncased-finetuned-yelp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_wnut_model_navnitan_en.md b/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_wnut_model_navnitan_en.md new file mode 100644 index 00000000000000..e39234091ce5c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-burmese_awesome_wnut_model_navnitan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_navnitan DistilBertForTokenClassification from navnitan +author: John Snow Labs +name: burmese_awesome_wnut_model_navnitan +date: 2024-09-07 +tags: [en, open_source, onnx, token_classification, distilbert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_navnitan` is a English model originally trained by navnitan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_navnitan_en_5.5.0_3.0_1725729691315.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_navnitan_en_5.5.0_3.0_1725729691315.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = DistilBertForTokenClassification.pretrained("burmese_awesome_wnut_model_navnitan","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = DistilBertForTokenClassification.pretrained("burmese_awesome_wnut_model_navnitan", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_navnitan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/navnitan/my_awesome_wnut_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-07-finetuned_english_tonga_tonga_islands_vietnamese_pien_27_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-07-finetuned_english_tonga_tonga_islands_vietnamese_pien_27_pipeline_en.md new file mode 100644 index 00000000000000..853e17899b736d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-07-finetuned_english_tonga_tonga_islands_vietnamese_pien_27_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_english_tonga_tonga_islands_vietnamese_pien_27_pipeline pipeline MarianTransformer from pien-27 +author: John Snow Labs +name: finetuned_english_tonga_tonga_islands_vietnamese_pien_27_pipeline +date: 2024-09-07 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_english_tonga_tonga_islands_vietnamese_pien_27_pipeline` is a English model originally trained by pien-27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_english_tonga_tonga_islands_vietnamese_pien_27_pipeline_en_5.5.0_3.0_1725747946611.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_english_tonga_tonga_islands_vietnamese_pien_27_pipeline_en_5.5.0_3.0_1725747946611.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_english_tonga_tonga_islands_vietnamese_pien_27_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_english_tonga_tonga_islands_vietnamese_pien_27_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_english_tonga_tonga_islands_vietnamese_pien_27_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|475.9 MB| + +## References + +https://huggingface.co/pien-27/finetuned-en-to-vi + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-all_mpnet_base_v2_margin_5_epoch_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-all_mpnet_base_v2_margin_5_epoch_1_pipeline_en.md new file mode 100644 index 00000000000000..1284949f980884 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-all_mpnet_base_v2_margin_5_epoch_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English all_mpnet_base_v2_margin_5_epoch_1_pipeline pipeline MPNetEmbeddings from luiz-and-robert-thesis +author: John Snow Labs +name: all_mpnet_base_v2_margin_5_epoch_1_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_mpnet_base_v2_margin_5_epoch_1_pipeline` is a English model originally trained by luiz-and-robert-thesis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_margin_5_epoch_1_pipeline_en_5.5.0_3.0_1725815956427.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_mpnet_base_v2_margin_5_epoch_1_pipeline_en_5.5.0_3.0_1725815956427.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_mpnet_base_v2_margin_5_epoch_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_mpnet_base_v2_margin_5_epoch_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_mpnet_base_v2_margin_5_epoch_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/luiz-and-robert-thesis/all-mpnet-base-v2-margin-5-epoch-1 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-customersentiment_en.md b/docs/_posts/ahmedlone127/2024-09-08-customersentiment_en.md new file mode 100644 index 00000000000000..3a659df843adba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-customersentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English customersentiment DistilBertForSequenceClassification from kearney +author: John Snow Labs +name: customersentiment +date: 2024-09-08 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`customersentiment` is a English model originally trained by kearney. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/customersentiment_en_5.5.0_3.0_1725777151848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/customersentiment_en_5.5.0_3.0_1725777151848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("customersentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("customersentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|customersentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/kearney/customersentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_squad_patrikrac_en.md b/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_squad_patrikrac_en.md new file mode 100644 index 00000000000000..aec6f26431f79b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-distilbert_base_uncased_finetuned_squad_patrikrac_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_patrikrac DistilBertForQuestionAnswering from patrikrac +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_patrikrac +date: 2024-09-08 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_patrikrac` is a English model originally trained by patrikrac. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_patrikrac_en_5.5.0_3.0_1725798077063.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_patrikrac_en_5.5.0_3.0_1725798077063.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_patrikrac","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_patrikrac", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_patrikrac| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/patrikrac/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-final_model1_en.md b/docs/_posts/ahmedlone127/2024-09-08-final_model1_en.md new file mode 100644 index 00000000000000..8b4975752fd0a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-final_model1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English final_model1 DistilBertForSequenceClassification from sachit56 +author: John Snow Labs +name: final_model1 +date: 2024-09-08 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_model1` is a English model originally trained by sachit56. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_model1_en_5.5.0_3.0_1725774746342.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_model1_en_5.5.0_3.0_1725774746342.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("final_model1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("final_model1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_model1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sachit56/final_model1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-08-marian_finetuned_kde4_english_tonga_tonga_islands_french_bill1888_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-08-marian_finetuned_kde4_english_tonga_tonga_islands_french_bill1888_pipeline_en.md new file mode 100644 index 00000000000000..d5f0a56c0fcf41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-08-marian_finetuned_kde4_english_tonga_tonga_islands_french_bill1888_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_bill1888_pipeline pipeline MarianTransformer from bill1888 +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_bill1888_pipeline +date: 2024-09-08 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_bill1888_pipeline` is a English model originally trained by bill1888. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_bill1888_pipeline_en_5.5.0_3.0_1725766387220.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_bill1888_pipeline_en_5.5.0_3.0_1725766387220.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_bill1888_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_bill1888_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_bill1888_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|510.3 MB| + +## References + +https://huggingface.co/bill1888/marian-finetuned-kde4-en-to-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-tiny_bert_0102_6500_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-tiny_bert_0102_6500_pipeline_en.md new file mode 100644 index 00000000000000..aab897e219d12f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-tiny_bert_0102_6500_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tiny_bert_0102_6500_pipeline pipeline AlbertForSequenceClassification from gg-ai +author: John Snow Labs +name: tiny_bert_0102_6500_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_bert_0102_6500_pipeline` is a English model originally trained by gg-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_bert_0102_6500_pipeline_en_5.5.0_3.0_1725923814558.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_bert_0102_6500_pipeline_en_5.5.0_3.0_1725923814558.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tiny_bert_0102_6500_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tiny_bert_0102_6500_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_bert_0102_6500_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|20.5 MB| + +## References + +https://huggingface.co/gg-ai/tiny-bert-0102-6500 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_german_solvaysphere_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_german_solvaysphere_pipeline_en.md new file mode 100644 index 00000000000000..07a16155b900c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-09-xlm_roberta_base_finetuned_panx_german_solvaysphere_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_solvaysphere_pipeline pipeline XlmRoBertaForTokenClassification from solvaysphere +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_solvaysphere_pipeline +date: 2024-09-09 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_solvaysphere_pipeline` is a English model originally trained by solvaysphere. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_solvaysphere_pipeline_en_5.5.0_3.0_1725922273945.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_solvaysphere_pipeline_en_5.5.0_3.0_1725922273945.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_solvaysphere_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_solvaysphere_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_solvaysphere_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/solvaysphere/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_imdb_whole_word_masking_alagrine_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_imdb_whole_word_masking_alagrine_pipeline_en.md new file mode 100644 index 00000000000000..83cb9f54ec61fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-distilbert_base_uncased_finetuned_imdb_whole_word_masking_alagrine_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_imdb_whole_word_masking_alagrine_pipeline pipeline DistilBertEmbeddings from AlaGrine +author: John Snow Labs +name: distilbert_base_uncased_finetuned_imdb_whole_word_masking_alagrine_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_imdb_whole_word_masking_alagrine_pipeline` is a English model originally trained by AlaGrine. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_whole_word_masking_alagrine_pipeline_en_5.5.0_3.0_1725995850664.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_imdb_whole_word_masking_alagrine_pipeline_en_5.5.0_3.0_1725995850664.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_whole_word_masking_alagrine_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_imdb_whole_word_masking_alagrine_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_imdb_whole_word_masking_alagrine_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/AlaGrine/distilbert-base-uncased-finetuned-imdb-whole-word-masking + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_pipeline_en.md new file mode 100644 index 00000000000000..6d9416ec740a98 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_pipeline_en_5.5.0_3.0_1725984057950.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_pipeline_en_5.5.0_3.0_1725984057950.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|250.9 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_data_aug_mrpc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-retrieval_mpnet_dot_finetuned_llama3_synthetic_dataset_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-retrieval_mpnet_dot_finetuned_llama3_synthetic_dataset_pipeline_en.md new file mode 100644 index 00000000000000..728aa497609c66 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-retrieval_mpnet_dot_finetuned_llama3_synthetic_dataset_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English retrieval_mpnet_dot_finetuned_llama3_synthetic_dataset_pipeline pipeline MPNetEmbeddings from antonkirk +author: John Snow Labs +name: retrieval_mpnet_dot_finetuned_llama3_synthetic_dataset_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`retrieval_mpnet_dot_finetuned_llama3_synthetic_dataset_pipeline` is a English model originally trained by antonkirk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/retrieval_mpnet_dot_finetuned_llama3_synthetic_dataset_pipeline_en_5.5.0_3.0_1725936451753.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/retrieval_mpnet_dot_finetuned_llama3_synthetic_dataset_pipeline_en_5.5.0_3.0_1725936451753.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("retrieval_mpnet_dot_finetuned_llama3_synthetic_dataset_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("retrieval_mpnet_dot_finetuned_llama3_synthetic_dataset_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|retrieval_mpnet_dot_finetuned_llama3_synthetic_dataset_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.0 MB| + +## References + +https://huggingface.co/antonkirk/retrieval-mpnet-dot-finetuned-llama3-synthetic-dataset + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-test_false_positive_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-10-test_false_positive_2_pipeline_en.md new file mode 100644 index 00000000000000..5760cc16ae3647 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-test_false_positive_2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English test_false_positive_2_pipeline pipeline MPNetEmbeddings from witty-works +author: John Snow Labs +name: test_false_positive_2_pipeline +date: 2024-09-10 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MPNetEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_false_positive_2_pipeline` is a English model originally trained by witty-works. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_false_positive_2_pipeline_en_5.5.0_3.0_1725936743903.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_false_positive_2_pipeline_en_5.5.0_3.0_1725936743903.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_false_positive_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_false_positive_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_false_positive_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/witty-works/test_false_positive_2 + +## Included Models + +- DocumentAssembler +- MPNetEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_seed42_original_kinyarwanda_amh_eng_train_en.md b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_seed42_original_kinyarwanda_amh_eng_train_en.md new file mode 100644 index 00000000000000..27bf83d80c547a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-xlm_roberta_base_seed42_original_kinyarwanda_amh_eng_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_seed42_original_kinyarwanda_amh_eng_train XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_seed42_original_kinyarwanda_amh_eng_train +date: 2024-09-10 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_seed42_original_kinyarwanda_amh_eng_train` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_seed42_original_kinyarwanda_amh_eng_train_en_5.5.0_3.0_1726003610201.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_seed42_original_kinyarwanda_amh_eng_train_en_5.5.0_3.0_1726003610201.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_seed42_original_kinyarwanda_amh_eng_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_seed42_original_kinyarwanda_amh_eng_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_seed42_original_kinyarwanda_amh_eng_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|798.2 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_seed42_original_kin-amh-eng_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-10-xlmroberta_ner_employment_contract_ner_da.md b/docs/_posts/ahmedlone127/2024-09-10-xlmroberta_ner_employment_contract_ner_da.md new file mode 100644 index 00000000000000..b7daeae08362ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-10-xlmroberta_ner_employment_contract_ner_da.md @@ -0,0 +1,112 @@ +--- +layout: model +title: Danish XLMRobertaForTokenClassification Cased model (from saattrupdan) +author: John Snow Labs +name: xlmroberta_ner_employment_contract_ner +date: 2024-09-10 +tags: [da, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: da +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `employment-contract-ner-da` is a Danish model originally trained by `saattrupdan`. + +## Predicted Entities + +`SALARY`, `STARTDATE`, `WORKHOURS`, `WORKPLACE` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_employment_contract_ner_da_5.5.0_3.0_1725974005655.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_employment_contract_ner_da_5.5.0_3.0_1725974005655.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_employment_contract_ner","da") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_employment_contract_ner","da") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("da.ner.xlmr_roberta").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_employment_contract_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|da| +|Size:|798.1 MB| + +## References + +References + +- https://huggingface.co/saattrupdan/employment-contract-ner-da \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-all_roberta_large_v1_banking_1_16_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-all_roberta_large_v1_banking_1_16_5_pipeline_en.md new file mode 100644 index 00000000000000..3a8be89d0266b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-all_roberta_large_v1_banking_1_16_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English all_roberta_large_v1_banking_1_16_5_pipeline pipeline RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_banking_1_16_5_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_banking_1_16_5_pipeline` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_banking_1_16_5_pipeline_en_5.5.0_3.0_1726060856197.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_banking_1_16_5_pipeline_en_5.5.0_3.0_1726060856197.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("all_roberta_large_v1_banking_1_16_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("all_roberta_large_v1_banking_1_16_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_banking_1_16_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-banking-1-16-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-babyberta_aochildes_2_5m_aochildes_french_with_masking_seed3_finetuned_squad_en.md b/docs/_posts/ahmedlone127/2024-09-11-babyberta_aochildes_2_5m_aochildes_french_with_masking_seed3_finetuned_squad_en.md new file mode 100644 index 00000000000000..f0a291ac2558a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-babyberta_aochildes_2_5m_aochildes_french_with_masking_seed3_finetuned_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English babyberta_aochildes_2_5m_aochildes_french_with_masking_seed3_finetuned_squad RoBertaForQuestionAnswering from lielbin +author: John Snow Labs +name: babyberta_aochildes_2_5m_aochildes_french_with_masking_seed3_finetuned_squad +date: 2024-09-11 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`babyberta_aochildes_2_5m_aochildes_french_with_masking_seed3_finetuned_squad` is a English model originally trained by lielbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_2_5m_aochildes_french_with_masking_seed3_finetuned_squad_en_5.5.0_3.0_1726058540470.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/babyberta_aochildes_2_5m_aochildes_french_with_masking_seed3_finetuned_squad_en_5.5.0_3.0_1726058540470.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_aochildes_2_5m_aochildes_french_with_masking_seed3_finetuned_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("babyberta_aochildes_2_5m_aochildes_french_with_masking_seed3_finetuned_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|babyberta_aochildes_2_5m_aochildes_french_with_masking_seed3_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|32.0 MB| + +## References + +https://huggingface.co/lielbin/BabyBERTa-aochildes_2.5M_aochildes-french-with-Masking-seed3-finetuned-SQuAD \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_kjh97_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_kjh97_pipeline_en.md new file mode 100644 index 00000000000000..f18a0a6735e10b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-burmese_awesome_qa_model_kjh97_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_kjh97_pipeline pipeline RoBertaForQuestionAnswering from KJH97 +author: John Snow Labs +name: burmese_awesome_qa_model_kjh97_pipeline +date: 2024-09-11 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_kjh97_pipeline` is a English model originally trained by KJH97. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_kjh97_pipeline_en_5.5.0_3.0_1726055879861.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_kjh97_pipeline_en_5.5.0_3.0_1726055879861.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_kjh97_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_kjh97_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_kjh97_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.6 MB| + +## References + +https://huggingface.co/KJH97/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-custommodel_finance_sentiment_analytics_en.md b/docs/_posts/ahmedlone127/2024-09-11-custommodel_finance_sentiment_analytics_en.md new file mode 100644 index 00000000000000..aa8491bbb7f2b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-custommodel_finance_sentiment_analytics_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English custommodel_finance_sentiment_analytics RoBertaForSequenceClassification from WillWEI0103 +author: John Snow Labs +name: custommodel_finance_sentiment_analytics +date: 2024-09-11 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`custommodel_finance_sentiment_analytics` is a English model originally trained by WillWEI0103. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/custommodel_finance_sentiment_analytics_en_5.5.0_3.0_1726022221653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/custommodel_finance_sentiment_analytics_en_5.5.0_3.0_1726022221653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("custommodel_finance_sentiment_analytics","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("custommodel_finance_sentiment_analytics", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|custommodel_finance_sentiment_analytics| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.7 MB| + +## References + +https://huggingface.co/WillWEI0103/CustomModel_finance_sentiment_analytics \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-dutch_tonga_tonga_islands_iac_marian_en.md b/docs/_posts/ahmedlone127/2024-09-11-dutch_tonga_tonga_islands_iac_marian_en.md new file mode 100644 index 00000000000000..b8297add2bc69c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-dutch_tonga_tonga_islands_iac_marian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dutch_tonga_tonga_islands_iac_marian MarianTransformer from MihaiIonascu +author: John Snow Labs +name: dutch_tonga_tonga_islands_iac_marian +date: 2024-09-11 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dutch_tonga_tonga_islands_iac_marian` is a English model originally trained by MihaiIonascu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dutch_tonga_tonga_islands_iac_marian_en_5.5.0_3.0_1726049225070.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dutch_tonga_tonga_islands_iac_marian_en_5.5.0_3.0_1726049225070.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("dutch_tonga_tonga_islands_iac_marian","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("dutch_tonga_tonga_islands_iac_marian","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dutch_tonga_tonga_islands_iac_marian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|196.3 MB| + +## References + +https://huggingface.co/MihaiIonascu/NL_to_IaC_Marian \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-sent_bert_base_multilingual_cased_finetuned_amharic_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-11-sent_bert_base_multilingual_cased_finetuned_amharic_pipeline_xx.md new file mode 100644 index 00000000000000..58d3934a14344f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-sent_bert_base_multilingual_cased_finetuned_amharic_pipeline_xx.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Multilingual sent_bert_base_multilingual_cased_finetuned_amharic_pipeline pipeline BertSentenceEmbeddings from Davlan +author: John Snow Labs +name: sent_bert_base_multilingual_cased_finetuned_amharic_pipeline +date: 2024-09-11 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_multilingual_cased_finetuned_amharic_pipeline` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_amharic_pipeline_xx_5.5.0_3.0_1726057230184.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_multilingual_cased_finetuned_amharic_pipeline_xx_5.5.0_3.0_1726057230184.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_multilingual_cased_finetuned_amharic_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_multilingual_cased_finetuned_amharic_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_multilingual_cased_finetuned_amharic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.6 MB| + +## References + +https://huggingface.co/Davlan/bert-base-multilingual-cased-finetuned-amharic + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-11-sent_bert_large_arabic_ar.md b/docs/_posts/ahmedlone127/2024-09-11-sent_bert_large_arabic_ar.md new file mode 100644 index 00000000000000..f239f8473a3bdd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-11-sent_bert_large_arabic_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic sent_bert_large_arabic BertSentenceEmbeddings from asafaya +author: John Snow Labs +name: sent_bert_large_arabic +date: 2024-09-11 +tags: [ar, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_arabic` is a Arabic model originally trained by asafaya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_arabic_ar_5.5.0_3.0_1726057243306.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_arabic_ar_5.5.0_3.0_1726057243306.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_arabic","ar") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_arabic","ar") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_arabic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ar| +|Size:|1.3 GB| + +## References + +https://huggingface.co/asafaya/bert-large-arabic \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-angela_shuffle_tokens_regular_eval_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-angela_shuffle_tokens_regular_eval_pipeline_en.md new file mode 100644 index 00000000000000..ce4b064ce0410f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-angela_shuffle_tokens_regular_eval_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English angela_shuffle_tokens_regular_eval_pipeline pipeline XlmRoBertaForTokenClassification from azhang1212 +author: John Snow Labs +name: angela_shuffle_tokens_regular_eval_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angela_shuffle_tokens_regular_eval_pipeline` is a English model originally trained by azhang1212. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angela_shuffle_tokens_regular_eval_pipeline_en_5.5.0_3.0_1726164969915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angela_shuffle_tokens_regular_eval_pipeline_en_5.5.0_3.0_1726164969915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("angela_shuffle_tokens_regular_eval_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("angela_shuffle_tokens_regular_eval_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angela_shuffle_tokens_regular_eval_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/azhang1212/angela_shuffle_tokens_regular_eval + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-bert_csat_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-bert_csat_pipeline_en.md new file mode 100644 index 00000000000000..6c154b80537845 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-bert_csat_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_csat_pipeline pipeline DistilBertForSequenceClassification from MoaazZaki +author: John Snow Labs +name: bert_csat_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_csat_pipeline` is a English model originally trained by MoaazZaki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_csat_pipeline_en_5.5.0_3.0_1726100663141.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_csat_pipeline_en_5.5.0_3.0_1726100663141.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_csat_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_csat_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_csat_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MoaazZaki/bert-csat + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-distilbert_sanskrit_saskta_glue_experiment_qnli_256_en.md b/docs/_posts/ahmedlone127/2024-09-12-distilbert_sanskrit_saskta_glue_experiment_qnli_256_en.md new file mode 100644 index 00000000000000..3f704ee7a1d093 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-distilbert_sanskrit_saskta_glue_experiment_qnli_256_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_qnli_256 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_qnli_256 +date: 2024-09-12 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_qnli_256` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_qnli_256_en_5.5.0_3.0_1726100100380.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_qnli_256_en_5.5.0_3.0_1726100100380.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_qnli_256","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_qnli_256", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_qnli_256| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|71.6 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_qnli_256 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-helsinki_nlp_korean_english_base_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-helsinki_nlp_korean_english_base_test_pipeline_en.md new file mode 100644 index 00000000000000..da2f5cf89ac813 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-helsinki_nlp_korean_english_base_test_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English helsinki_nlp_korean_english_base_test_pipeline pipeline MarianTransformer from dalzza +author: John Snow Labs +name: helsinki_nlp_korean_english_base_test_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`helsinki_nlp_korean_english_base_test_pipeline` is a English model originally trained by dalzza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/helsinki_nlp_korean_english_base_test_pipeline_en_5.5.0_3.0_1726168077854.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/helsinki_nlp_korean_english_base_test_pipeline_en_5.5.0_3.0_1726168077854.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("helsinki_nlp_korean_english_base_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("helsinki_nlp_korean_english_base_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|helsinki_nlp_korean_english_base_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|541.0 MB| + +## References + +https://huggingface.co/dalzza/helsinki-nlp-ko-en-base-test + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-icelandic_there_aragonese_allergy_bert_first512_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-icelandic_there_aragonese_allergy_bert_first512_pipeline_en.md new file mode 100644 index 00000000000000..ff1236082af8b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-icelandic_there_aragonese_allergy_bert_first512_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English icelandic_there_aragonese_allergy_bert_first512_pipeline pipeline BertForSequenceClassification from etadevosyan +author: John Snow Labs +name: icelandic_there_aragonese_allergy_bert_first512_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`icelandic_there_aragonese_allergy_bert_first512_pipeline` is a English model originally trained by etadevosyan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/icelandic_there_aragonese_allergy_bert_first512_pipeline_en_5.5.0_3.0_1726182670516.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/icelandic_there_aragonese_allergy_bert_first512_pipeline_en_5.5.0_3.0_1726182670516.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("icelandic_there_aragonese_allergy_bert_first512_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("icelandic_there_aragonese_allergy_bert_first512_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|icelandic_there_aragonese_allergy_bert_first512_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|666.5 MB| + +## References + +https://huggingface.co/etadevosyan/is_there_an_allergy_bert_First512 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-iwslt17_marian_big_ctx4_cwd0_english_french_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-iwslt17_marian_big_ctx4_cwd0_english_french_pipeline_en.md new file mode 100644 index 00000000000000..7c72f01db1b801 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-iwslt17_marian_big_ctx4_cwd0_english_french_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English iwslt17_marian_big_ctx4_cwd0_english_french_pipeline pipeline MarianTransformer from context-mt +author: John Snow Labs +name: iwslt17_marian_big_ctx4_cwd0_english_french_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`iwslt17_marian_big_ctx4_cwd0_english_french_pipeline` is a English model originally trained by context-mt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/iwslt17_marian_big_ctx4_cwd0_english_french_pipeline_en_5.5.0_3.0_1726161779380.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/iwslt17_marian_big_ctx4_cwd0_english_french_pipeline_en_5.5.0_3.0_1726161779380.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("iwslt17_marian_big_ctx4_cwd0_english_french_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("iwslt17_marian_big_ctx4_cwd0_english_french_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|iwslt17_marian_big_ctx4_cwd0_english_french_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/context-mt/iwslt17-marian-big-ctx4-cwd0-en-fr + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-redo_norwegian_delete_5e_5_hausa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-12-redo_norwegian_delete_5e_5_hausa_pipeline_en.md new file mode 100644 index 00000000000000..a0be91a2aa0dc2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-redo_norwegian_delete_5e_5_hausa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English redo_norwegian_delete_5e_5_hausa_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: redo_norwegian_delete_5e_5_hausa_pipeline +date: 2024-09-12 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`redo_norwegian_delete_5e_5_hausa_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/redo_norwegian_delete_5e_5_hausa_pipeline_en_5.5.0_3.0_1726131820807.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/redo_norwegian_delete_5e_5_hausa_pipeline_en_5.5.0_3.0_1726131820807.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("redo_norwegian_delete_5e_5_hausa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("redo_norwegian_delete_5e_5_hausa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|redo_norwegian_delete_5e_5_hausa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/redo_no_delete_5e-5_hausa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-12-sent_turkish_small_bert_uncased_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-12-sent_turkish_small_bert_uncased_pipeline_tr.md new file mode 100644 index 00000000000000..1b7c1c4e7eb64c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-12-sent_turkish_small_bert_uncased_pipeline_tr.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Turkish sent_turkish_small_bert_uncased_pipeline pipeline BertSentenceEmbeddings from ytu-ce-cosmos +author: John Snow Labs +name: sent_turkish_small_bert_uncased_pipeline +date: 2024-09-12 +tags: [tr, open_source, pipeline, onnx] +task: Embeddings +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_turkish_small_bert_uncased_pipeline` is a Turkish model originally trained by ytu-ce-cosmos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_turkish_small_bert_uncased_pipeline_tr_5.5.0_3.0_1726119337304.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_turkish_small_bert_uncased_pipeline_tr_5.5.0_3.0_1726119337304.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_turkish_small_bert_uncased_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_turkish_small_bert_uncased_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_turkish_small_bert_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|110.4 MB| + +## References + +https://huggingface.co/ytu-ce-cosmos/turkish-small-bert-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_squad_likhith231_en.md b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_squad_likhith231_en.md new file mode 100644 index 00000000000000..0696367d9e15c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-distilbert_base_uncased_finetuned_squad_likhith231_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_likhith231 DistilBertForQuestionAnswering from likhith231 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_likhith231 +date: 2024-09-13 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_likhith231` is a English model originally trained by likhith231. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_likhith231_en_5.5.0_3.0_1726245491731.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_likhith231_en_5.5.0_3.0_1726245491731.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_likhith231","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_likhith231", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_likhith231| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/likhith231/distilbert-base-uncased-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-finetuned_sail2017_additionalpretrained_indic_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-finetuned_sail2017_additionalpretrained_indic_bert_pipeline_en.md new file mode 100644 index 00000000000000..6c67992fe950f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-finetuned_sail2017_additionalpretrained_indic_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_sail2017_additionalpretrained_indic_bert_pipeline pipeline AlbertForSequenceClassification from aditeyabaral +author: John Snow Labs +name: finetuned_sail2017_additionalpretrained_indic_bert_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_sail2017_additionalpretrained_indic_bert_pipeline` is a English model originally trained by aditeyabaral. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_sail2017_additionalpretrained_indic_bert_pipeline_en_5.5.0_3.0_1726188305415.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_sail2017_additionalpretrained_indic_bert_pipeline_en_5.5.0_3.0_1726188305415.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_sail2017_additionalpretrained_indic_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_sail2017_additionalpretrained_indic_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_sail2017_additionalpretrained_indic_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|127.8 MB| + +## References + +https://huggingface.co/aditeyabaral/finetuned-sail2017-additionalpretrained-indic-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-horai_medium_10k_v4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-horai_medium_10k_v4_pipeline_en.md new file mode 100644 index 00000000000000..97675d78927d9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-horai_medium_10k_v4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English horai_medium_10k_v4_pipeline pipeline RoBertaForSequenceClassification from stealthwriter +author: John Snow Labs +name: horai_medium_10k_v4_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`horai_medium_10k_v4_pipeline` is a English model originally trained by stealthwriter. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/horai_medium_10k_v4_pipeline_en_5.5.0_3.0_1726187577444.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/horai_medium_10k_v4_pipeline_en_5.5.0_3.0_1726187577444.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("horai_medium_10k_v4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("horai_medium_10k_v4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|horai_medium_10k_v4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.9 MB| + +## References + +https://huggingface.co/stealthwriter/HorAI-medium-10k-V4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-job_compatibility_model_en.md b/docs/_posts/ahmedlone127/2024-09-13-job_compatibility_model_en.md new file mode 100644 index 00000000000000..94ec588d337f0d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-job_compatibility_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English job_compatibility_model DistilBertForSequenceClassification from DaJulster +author: John Snow Labs +name: job_compatibility_model +date: 2024-09-13 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`job_compatibility_model` is a English model originally trained by DaJulster. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/job_compatibility_model_en_5.5.0_3.0_1726262430661.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/job_compatibility_model_en_5.5.0_3.0_1726262430661.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("job_compatibility_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("job_compatibility_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|job_compatibility_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DaJulster/Job_compatibility_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-roberta_finetuned_subjqa_movies_2_vijayaphani5_en.md b/docs/_posts/ahmedlone127/2024-09-13-roberta_finetuned_subjqa_movies_2_vijayaphani5_en.md new file mode 100644 index 00000000000000..08e6107f8a82e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-roberta_finetuned_subjqa_movies_2_vijayaphani5_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_finetuned_subjqa_movies_2_vijayaphani5 RoBertaForQuestionAnswering from vijayaphani5 +author: John Snow Labs +name: roberta_finetuned_subjqa_movies_2_vijayaphani5 +date: 2024-09-13 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_subjqa_movies_2_vijayaphani5` is a English model originally trained by vijayaphani5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_vijayaphani5_en_5.5.0_3.0_1726207077888.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_subjqa_movies_2_vijayaphani5_en_5.5.0_3.0_1726207077888.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_finetuned_subjqa_movies_2_vijayaphani5","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_finetuned_subjqa_movies_2_vijayaphani5", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_subjqa_movies_2_vijayaphani5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/vijayaphani5/roberta-finetuned-subjqa-movies_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-roberta_small_turkish_clean_uncased_tr.md b/docs/_posts/ahmedlone127/2024-09-13-roberta_small_turkish_clean_uncased_tr.md new file mode 100644 index 00000000000000..20074933711ba8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-roberta_small_turkish_clean_uncased_tr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Turkish roberta_small_turkish_clean_uncased RoBertaEmbeddings from burakaytan +author: John Snow Labs +name: roberta_small_turkish_clean_uncased +date: 2024-09-13 +tags: [tr, open_source, onnx, embeddings, roberta] +task: Embeddings +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_small_turkish_clean_uncased` is a Turkish model originally trained by burakaytan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_small_turkish_clean_uncased_tr_5.5.0_3.0_1726264750214.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_small_turkish_clean_uncased_tr_5.5.0_3.0_1726264750214.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_small_turkish_clean_uncased","tr") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_small_turkish_clean_uncased","tr") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_small_turkish_clean_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|tr| +|Size:|222.3 MB| + +## References + +https://huggingface.co/burakaytan/roberta-small-turkish-clean-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-sent_rubiobert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-13-sent_rubiobert_pipeline_en.md new file mode 100644 index 00000000000000..b0ca1f71e84fb4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-sent_rubiobert_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_rubiobert_pipeline pipeline BertSentenceEmbeddings from alexyalunin +author: John Snow Labs +name: sent_rubiobert_pipeline +date: 2024-09-13 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_rubiobert_pipeline` is a English model originally trained by alexyalunin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_rubiobert_pipeline_en_5.5.0_3.0_1726246257329.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_rubiobert_pipeline_en_5.5.0_3.0_1726246257329.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_rubiobert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_rubiobert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_rubiobert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|667.7 MB| + +## References + +https://huggingface.co/alexyalunin/RuBioBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-13-whisper_small_swahili_jayem_11_en.md b/docs/_posts/ahmedlone127/2024-09-13-whisper_small_swahili_jayem_11_en.md new file mode 100644 index 00000000000000..f8f98e6644d3c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-13-whisper_small_swahili_jayem_11_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_swahili_jayem_11 WhisperForCTC from Jayem-11 +author: John Snow Labs +name: whisper_small_swahili_jayem_11 +date: 2024-09-13 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_swahili_jayem_11` is a English model originally trained by Jayem-11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_swahili_jayem_11_en_5.5.0_3.0_1726219179545.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_swahili_jayem_11_en_5.5.0_3.0_1726219179545.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_swahili_jayem_11","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_swahili_jayem_11", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_swahili_jayem_11| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Jayem-11/whisper-small-swahili \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-bert_finetuned_ner_tokenizer_en.md b/docs/_posts/ahmedlone127/2024-09-14-bert_finetuned_ner_tokenizer_en.md new file mode 100644 index 00000000000000..dfabdb96a55268 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-bert_finetuned_ner_tokenizer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_ner_tokenizer BertForTokenClassification from alban12 +author: John Snow Labs +name: bert_finetuned_ner_tokenizer +date: 2024-09-14 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_tokenizer` is a English model originally trained by alban12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_tokenizer_en_5.5.0_3.0_1726305583884.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_tokenizer_en_5.5.0_3.0_1726305583884.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_tokenizer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_tokenizer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_tokenizer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|408.6 MB| + +## References + +https://huggingface.co/alban12/bert-finetuned-ner-tokenizer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-combined_model_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-combined_model_v1_pipeline_en.md new file mode 100644 index 00000000000000..bbf89f2dcc77ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-combined_model_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English combined_model_v1_pipeline pipeline RoBertaForTokenClassification from ICT2214Team7 +author: John Snow Labs +name: combined_model_v1_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`combined_model_v1_pipeline` is a English model originally trained by ICT2214Team7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/combined_model_v1_pipeline_en_5.5.0_3.0_1726316018766.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/combined_model_v1_pipeline_en_5.5.0_3.0_1726316018766.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("combined_model_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("combined_model_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|combined_model_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/ICT2214Team7/Combined_model_v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies_pipeline_en.md new file mode 100644 index 00000000000000..b50d1e65dda75f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies_pipeline pipeline RoBertaEmbeddings from ietz +author: John Snow Labs +name: distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies_pipeline` is a English model originally trained by ietz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies_pipeline_en_5.5.0_3.0_1726338244439.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies_pipeline_en_5.5.0_3.0_1726338244439.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_jira_qt_issue_titles_and_bodies_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/ietz/distilroberta-base-finetuned-jira-qt-issue-titles-and-bodies + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-medical_english_chinese_9_1_pt2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-medical_english_chinese_9_1_pt2_pipeline_en.md new file mode 100644 index 00000000000000..9c7615e752bdb2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-medical_english_chinese_9_1_pt2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English medical_english_chinese_9_1_pt2_pipeline pipeline MarianTransformer from DogGoesBark +author: John Snow Labs +name: medical_english_chinese_9_1_pt2_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`medical_english_chinese_9_1_pt2_pipeline` is a English model originally trained by DogGoesBark. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/medical_english_chinese_9_1_pt2_pipeline_en_5.5.0_3.0_1726351419793.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/medical_english_chinese_9_1_pt2_pipeline_en_5.5.0_3.0_1726351419793.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("medical_english_chinese_9_1_pt2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("medical_english_chinese_9_1_pt2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|medical_english_chinese_9_1_pt2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|541.9 MB| + +## References + +https://huggingface.co/DogGoesBark/medical_en_zh_9_1_pt2 + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-roberta_ner_deid_roberta_i2b2_en.md b/docs/_posts/ahmedlone127/2024-09-14-roberta_ner_deid_roberta_i2b2_en.md new file mode 100644 index 00000000000000..84b3d690777319 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-roberta_ner_deid_roberta_i2b2_en.md @@ -0,0 +1,122 @@ +--- +layout: model +title: English RobertaForTokenClassification Cased model (from obi) +author: John Snow Labs +name: roberta_ner_deid_roberta_i2b2 +date: 2024-09-14 +tags: [bert, ner, open_source, en, onnx, openvino] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: openvino +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `deid_roberta_i2b2` is a English model originally trained by `obi`. + +## Predicted Entities + +`DATE`, `L-AGE`, `U-PATIENT`, `L-STAFF`, `U-OTHERPHI`, `U-ID`, `EMAIL`, `U-LOC`, `L-HOSP`, `L-PATIENT`, `PATIENT`, `PHONE`, `U-PHONE`, `L-OTHERPHI`, `HOSP`, `L-PATORG`, `AGE`, `U-EMAIL`, `L-ID`, `U-HOSP`, `U-AGE`, `OTHERPHI`, `LOC`, `ID`, `U-DATE`, `L-DATE`, `U-PATORG`, `L-PHONE`, `STAFF`, `L-EMAIL`, `PATORG`, `U-STAFF`, `L-LOC` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_ner_deid_roberta_i2b2_en_5.5.0_3.0_1726298204413.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_ner_deid_roberta_i2b2_en_5.5.0_3.0_1726298204413.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx")\ + .setInputCols(["document"])\ + .setOutputCol("sentence") + +tokenizer = Tokenizer() \ + .setInputCols("sentence") \ + .setOutputCol("token") + +tokenClassifier = BertForTokenClassification.pretrained("roberta_ner_deid_roberta_i2b2","en") \ + .setInputCols(["sentence", "token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline(stages=[documentAssembler, sentenceDetector, tokenizer, tokenClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDetector = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val tokenizer = new Tokenizer() + .setInputCols(Array("sentence")) + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("roberta_ner_deid_roberta_i2b2","en") + .setInputCols(Array("sentence", "token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler,sentenceDetector, tokenizer, tokenClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.ner.roberta.by_obi").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_ner_deid_roberta_i2b2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[token, document]| +|Output Labels:|[label]| +|Language:|en| +|Size:|1.3 GB| +|Case sensitive:|true| + +## References + +References + +References + +- https://huggingface.co/obi/deid_roberta_i2b2 +- https://arxiv.org/pdf/1907.11692.pdf +- https://github.com/obi-ml-public/ehr_deidentification/tree/master/steps/train +- https://www.ncbi.nlm.nih.gov/pmc/articles/PMC4978170/ +- https://www.hhs.gov/hipaa/for-professionals/privacy/laws-regulations/index.html +- https://github.com/obi-ml-public/ehr_deidentification +- https://github.com/obi-ml-public/ehr_deidentification/tree/master/steps/forward_pass +- https://github.com/obi-ml-public/ehr_deidentification/blob/master/AnnotationGuidelines.md \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-tse_albert_5e_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-tse_albert_5e_pipeline_en.md new file mode 100644 index 00000000000000..55fbecfd2aa06f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-tse_albert_5e_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tse_albert_5e_pipeline pipeline AlbertForSequenceClassification from pig4431 +author: John Snow Labs +name: tse_albert_5e_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tse_albert_5e_pipeline` is a English model originally trained by pig4431. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tse_albert_5e_pipeline_en_5.5.0_3.0_1726336234917.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tse_albert_5e_pipeline_en_5.5.0_3.0_1726336234917.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tse_albert_5e_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tse_albert_5e_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tse_albert_5e_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/pig4431/TSE_ALBERT_5E + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-twitter_roberta_base_mar2022_15m_incr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-14-twitter_roberta_base_mar2022_15m_incr_pipeline_en.md new file mode 100644 index 00000000000000..f7ccc4a790e3dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-twitter_roberta_base_mar2022_15m_incr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitter_roberta_base_mar2022_15m_incr_pipeline pipeline RoBertaEmbeddings from cardiffnlp +author: John Snow Labs +name: twitter_roberta_base_mar2022_15m_incr_pipeline +date: 2024-09-14 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_mar2022_15m_incr_pipeline` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_mar2022_15m_incr_pipeline_en_5.5.0_3.0_1726300024881.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_mar2022_15m_incr_pipeline_en_5.5.0_3.0_1726300024881.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitter_roberta_base_mar2022_15m_incr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitter_roberta_base_mar2022_15m_incr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_mar2022_15m_incr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/cardiffnlp/twitter-roberta-base-mar2022-15M-incr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-14-whisper_small_indonesian_cahya_id.md b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_indonesian_cahya_id.md new file mode 100644 index 00000000000000..f8722b2dfb9a2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-14-whisper_small_indonesian_cahya_id.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Indonesian whisper_small_indonesian_cahya WhisperForCTC from cahya +author: John Snow Labs +name: whisper_small_indonesian_cahya +date: 2024-09-14 +tags: [id, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_indonesian_cahya` is a Indonesian model originally trained by cahya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_cahya_id_5.5.0_3.0_1726321577596.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_indonesian_cahya_id_5.5.0_3.0_1726321577596.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_indonesian_cahya","id") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_indonesian_cahya", "id") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_indonesian_cahya| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|id| +|Size:|1.7 GB| + +## References + +https://huggingface.co/cahya/whisper-small-id \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_wearecoding_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_wearecoding_pipeline_en.md new file mode 100644 index 00000000000000..6c70fade2b1b4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-burmese_awesome_qa_model_wearecoding_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_wearecoding_pipeline pipeline DistilBertForQuestionAnswering from wearecoding +author: John Snow Labs +name: burmese_awesome_qa_model_wearecoding_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_wearecoding_pipeline` is a English model originally trained by wearecoding. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_wearecoding_pipeline_en_5.5.0_3.0_1726382846787.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_wearecoding_pipeline_en_5.5.0_3.0_1726382846787.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_wearecoding_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_wearecoding_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_wearecoding_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/wearecoding/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_fituned_clinc_en.md b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_fituned_clinc_en.md new file mode 100644 index 00000000000000..d090c34d1b3549 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-distilbert_base_uncased_fituned_clinc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_fituned_clinc DistilBertForSequenceClassification from Takeshi10Days +author: John Snow Labs +name: distilbert_base_uncased_fituned_clinc +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_fituned_clinc` is a English model originally trained by Takeshi10Days. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_fituned_clinc_en_5.5.0_3.0_1726394190213.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_fituned_clinc_en_5.5.0_3.0_1726394190213.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_fituned_clinc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_fituned_clinc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_fituned_clinc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/Takeshi10Days/distilbert-base-uncased-fituned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_sanju2u_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_sanju2u_pipeline_en.md new file mode 100644 index 00000000000000..f4d504a3401e63 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-finetuning_sentiment_model_3000_samples_sanju2u_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_sanju2u_pipeline pipeline DistilBertForSequenceClassification from sanju2u +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_sanju2u_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_sanju2u_pipeline` is a English model originally trained by sanju2u. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_sanju2u_pipeline_en_5.5.0_3.0_1726394012126.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_sanju2u_pipeline_en_5.5.0_3.0_1726394012126.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_sanju2u_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_sanju2u_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_sanju2u_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sanju2u/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-schem_roberta_demographic_text_disagreement_predictor_en.md b/docs/_posts/ahmedlone127/2024-09-15-schem_roberta_demographic_text_disagreement_predictor_en.md new file mode 100644 index 00000000000000..316024af12cdcf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-schem_roberta_demographic_text_disagreement_predictor_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English schem_roberta_demographic_text_disagreement_predictor RoBertaForSequenceClassification from RuyuanWan +author: John Snow Labs +name: schem_roberta_demographic_text_disagreement_predictor +date: 2024-09-15 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`schem_roberta_demographic_text_disagreement_predictor` is a English model originally trained by RuyuanWan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/schem_roberta_demographic_text_disagreement_predictor_en_5.5.0_3.0_1726401504408.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/schem_roberta_demographic_text_disagreement_predictor_en_5.5.0_3.0_1726401504408.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("schem_roberta_demographic_text_disagreement_predictor","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("schem_roberta_demographic_text_disagreement_predictor", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|schem_roberta_demographic_text_disagreement_predictor| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|422.8 MB| + +## References + +https://huggingface.co/RuyuanWan/SChem_RoBERTa_Demographic-text_Disagreement_Predictor \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-sent_kannada_bert_kn.md b/docs/_posts/ahmedlone127/2024-09-15-sent_kannada_bert_kn.md new file mode 100644 index 00000000000000..a1b4d2d4b136dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-sent_kannada_bert_kn.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Kannada sent_kannada_bert BertSentenceEmbeddings from l3cube-pune +author: John Snow Labs +name: sent_kannada_bert +date: 2024-09-15 +tags: [kn, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: kn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_kannada_bert` is a Kannada model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_kannada_bert_kn_5.5.0_3.0_1726443487561.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_kannada_bert_kn_5.5.0_3.0_1726443487561.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_kannada_bert","kn") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_kannada_bert","kn") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_kannada_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|kn| +|Size:|890.5 MB| + +## References + +https://huggingface.co/l3cube-pune/kannada-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-15-test_rrrrrrrita_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-15-test_rrrrrrrita_pipeline_en.md new file mode 100644 index 00000000000000..620016743feed3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-15-test_rrrrrrrita_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_rrrrrrrita_pipeline pipeline DistilBertForSequenceClassification from Rrrrrrrita +author: John Snow Labs +name: test_rrrrrrrita_pipeline +date: 2024-09-15 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_rrrrrrrita_pipeline` is a English model originally trained by Rrrrrrrita. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_rrrrrrrita_pipeline_en_5.5.0_3.0_1726366512741.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_rrrrrrrita_pipeline_en_5.5.0_3.0_1726366512741.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_rrrrrrrita_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_rrrrrrrita_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_rrrrrrrita_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Rrrrrrrita/test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-albert_finetuned_tweet_eval_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-albert_finetuned_tweet_eval_pipeline_en.md new file mode 100644 index 00000000000000..bcce8ff176c55b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-albert_finetuned_tweet_eval_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English albert_finetuned_tweet_eval_pipeline pipeline AlbertForSequenceClassification from iaminhridoy +author: John Snow Labs +name: albert_finetuned_tweet_eval_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_finetuned_tweet_eval_pipeline` is a English model originally trained by iaminhridoy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_finetuned_tweet_eval_pipeline_en_5.5.0_3.0_1726523516421.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_finetuned_tweet_eval_pipeline_en_5.5.0_3.0_1726523516421.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albert_finetuned_tweet_eval_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albert_finetuned_tweet_eval_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_finetuned_tweet_eval_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/iaminhridoy/AlBert-finetuned-Tweet_Eval + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-bankstatementmodelver7_en.md b/docs/_posts/ahmedlone127/2024-09-16-bankstatementmodelver7_en.md new file mode 100644 index 00000000000000..da84ae506f5d94 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-bankstatementmodelver7_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bankstatementmodelver7 RoBertaForQuestionAnswering from Souvik123 +author: John Snow Labs +name: bankstatementmodelver7 +date: 2024-09-16 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bankstatementmodelver7` is a English model originally trained by Souvik123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bankstatementmodelver7_en_5.5.0_3.0_1726460637592.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bankstatementmodelver7_en_5.5.0_3.0_1726460637592.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("bankstatementmodelver7","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("bankstatementmodelver7", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bankstatementmodelver7| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|463.7 MB| + +## References + +https://huggingface.co/Souvik123/bankstatementmodelver7 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-consumerresponseclassifier_en.md b/docs/_posts/ahmedlone127/2024-09-16-consumerresponseclassifier_en.md new file mode 100644 index 00000000000000..b7fb72798ae52c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-consumerresponseclassifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English consumerresponseclassifier RoBertaForSequenceClassification from ahaanlimaye +author: John Snow Labs +name: consumerresponseclassifier +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`consumerresponseclassifier` is a English model originally trained by ahaanlimaye. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/consumerresponseclassifier_en_5.5.0_3.0_1726504833456.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/consumerresponseclassifier_en_5.5.0_3.0_1726504833456.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("consumerresponseclassifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("consumerresponseclassifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|consumerresponseclassifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|424.9 MB| + +## References + +https://huggingface.co/ahaanlimaye/ConsumerResponseClassifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_pipeline_en.md new file mode 100644 index 00000000000000..d6db063017b38a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_pipeline pipeline MarianTransformer from tkoyama +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_pipeline` is a English model originally trained by tkoyama. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_pipeline_en_5.5.0_3.0_1726494320116.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_pipeline_en_5.5.0_3.0_1726494320116.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_accelerate_tkoyama_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|508.7 MB| + +## References + +https://huggingface.co/tkoyama/marian-finetuned-kde4-en-to-fr-accelerate + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-suicide_distilbert_2_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-16-suicide_distilbert_2_5_pipeline_en.md new file mode 100644 index 00000000000000..4c363dfe8a0456 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-suicide_distilbert_2_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English suicide_distilbert_2_5_pipeline pipeline DistilBertForSequenceClassification from cuadron11 +author: John Snow Labs +name: suicide_distilbert_2_5_pipeline +date: 2024-09-16 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`suicide_distilbert_2_5_pipeline` is a English model originally trained by cuadron11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/suicide_distilbert_2_5_pipeline_en_5.5.0_3.0_1726506828788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/suicide_distilbert_2_5_pipeline_en_5.5.0_3.0_1726506828788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("suicide_distilbert_2_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("suicide_distilbert_2_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|suicide_distilbert_2_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cuadron11/suicide-distilbert-2-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_spanish_zuazo_es.md b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_spanish_zuazo_es.md new file mode 100644 index 00000000000000..810bb03c69ba72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-whisper_tiny_spanish_zuazo_es.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Castilian, Spanish whisper_tiny_spanish_zuazo WhisperForCTC from zuazo +author: John Snow Labs +name: whisper_tiny_spanish_zuazo +date: 2024-09-16 +tags: [es, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_spanish_zuazo` is a Castilian, Spanish model originally trained by zuazo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_spanish_zuazo_es_5.5.0_3.0_1726486380554.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_spanish_zuazo_es_5.5.0_3.0_1726486380554.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_spanish_zuazo","es") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_spanish_zuazo", "es") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_spanish_zuazo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|es| +|Size:|390.5 MB| + +## References + +https://huggingface.co/zuazo/whisper-tiny-es \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-withinapps_ndd_addressbook_test_content_tags_cwadj_en.md b/docs/_posts/ahmedlone127/2024-09-16-withinapps_ndd_addressbook_test_content_tags_cwadj_en.md new file mode 100644 index 00000000000000..40eb154f68cf8d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-withinapps_ndd_addressbook_test_content_tags_cwadj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English withinapps_ndd_addressbook_test_content_tags_cwadj DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_addressbook_test_content_tags_cwadj +date: 2024-09-16 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_addressbook_test_content_tags_cwadj` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_addressbook_test_content_tags_cwadj_en_5.5.0_3.0_1726525645004.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_addressbook_test_content_tags_cwadj_en_5.5.0_3.0_1726525645004.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_addressbook_test_content_tags_cwadj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_addressbook_test_content_tags_cwadj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_addressbook_test_content_tags_cwadj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-addressbook_test-content_tags-CWAdj \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_francois2511_en.md b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_francois2511_en.md new file mode 100644 index 00000000000000..e203e8eaf5b381 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-16-xlm_roberta_base_finetuned_panx_german_french_francois2511_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_francois2511 XlmRoBertaForTokenClassification from Francois2511 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_francois2511 +date: 2024-09-16 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_francois2511` is a English model originally trained by Francois2511. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_francois2511_en_5.5.0_3.0_1726496995627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_francois2511_en_5.5.0_3.0_1726496995627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_francois2511","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_francois2511", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_francois2511| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/Francois2511/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-discourse_prediction__basic_en.md b/docs/_posts/ahmedlone127/2024-09-17-discourse_prediction__basic_en.md new file mode 100644 index 00000000000000..3c8c9ee47d2e5f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-discourse_prediction__basic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English discourse_prediction__basic AlbertForSequenceClassification from alex2awesome +author: John Snow Labs +name: discourse_prediction__basic +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, albert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`discourse_prediction__basic` is a English model originally trained by alex2awesome. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/discourse_prediction__basic_en_5.5.0_3.0_1726600650802.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/discourse_prediction__basic_en_5.5.0_3.0_1726600650802.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = AlbertForSequenceClassification.pretrained("discourse_prediction__basic","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = AlbertForSequenceClassification.pretrained("discourse_prediction__basic", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|discourse_prediction__basic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|834.0 MB| + +## References + +https://huggingface.co/alex2awesome/discourse-prediction__basic \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_german_cased_v1_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_german_cased_v1_en.md new file mode 100644 index 00000000000000..2c95f404a20dda --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_german_cased_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_german_cased_v1 DistilBertForSequenceClassification from mserloth +author: John Snow Labs +name: distilbert_base_german_cased_v1 +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_german_cased_v1` is a English model originally trained by mserloth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_german_cased_v1_en_5.5.0_3.0_1726584885995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_german_cased_v1_en_5.5.0_3.0_1726584885995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_german_cased_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_german_cased_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_german_cased_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|252.5 MB| + +## References + +https://huggingface.co/mserloth/distilbert-base-german-cased-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_clinc_seddiktrk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_clinc_seddiktrk_pipeline_en.md new file mode 100644 index 00000000000000..00c7defa756117 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_finetuned_clinc_seddiktrk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_seddiktrk_pipeline pipeline DistilBertForSequenceClassification from seddiktrk +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_seddiktrk_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_seddiktrk_pipeline` is a English model originally trained by seddiktrk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_seddiktrk_pipeline_en_5.5.0_3.0_1726584489371.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_seddiktrk_pipeline_en_5.5.0_3.0_1726584489371.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_seddiktrk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_seddiktrk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_seddiktrk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/seddiktrk/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge17_simsp400_clean300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge17_simsp400_clean300_pipeline_en.md new file mode 100644 index 00000000000000..d99c4418fcd646 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge17_simsp400_clean300_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge17_simsp400_clean300_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge17_simsp400_clean300_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge17_simsp400_clean300_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge17_simsp400_clean300_pipeline_en_5.5.0_3.0_1726593761649.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge17_simsp400_clean300_pipeline_en_5.5.0_3.0_1726593761649.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge17_simsp400_clean300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge17_simsp400_clean300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st17sd_ut72ut1_plprefix0stlarge17_simsp400_clean300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st17sd_ut72ut1_PLPrefix0stlarge17_simsp400_clean300 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p70_en.md b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p70_en.md new file mode 100644 index 00000000000000..8867ba8eaa8785 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-distilbert_base_uncased_squad2_p70_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_squad2_p70 DistilBertForQuestionAnswering from pminha +author: John Snow Labs +name: distilbert_base_uncased_squad2_p70 +date: 2024-09-17 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_squad2_p70` is a English model originally trained by pminha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p70_en_5.5.0_3.0_1726575072029.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_squad2_p70_en_5.5.0_3.0_1726575072029.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_p70","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_squad2_p70", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_squad2_p70| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|155.3 MB| + +## References + +https://huggingface.co/pminha/distilbert-base-uncased-squad2-p70 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-fine_tuned_albert_tweets_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-fine_tuned_albert_tweets_pipeline_en.md new file mode 100644 index 00000000000000..0f9a9302d38215 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-fine_tuned_albert_tweets_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fine_tuned_albert_tweets_pipeline pipeline AlbertForSequenceClassification from imsarfaroz +author: John Snow Labs +name: fine_tuned_albert_tweets_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_albert_tweets_pipeline` is a English model originally trained by imsarfaroz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_albert_tweets_pipeline_en_5.5.0_3.0_1726614199582.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_albert_tweets_pipeline_en_5.5.0_3.0_1726614199582.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fine_tuned_albert_tweets_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fine_tuned_albert_tweets_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_albert_tweets_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|44.2 MB| + +## References + +https://huggingface.co/imsarfaroz/fine-tuned-albert-tweets + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-interlingua_multilingual_transliterated_roberta_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-17-interlingua_multilingual_transliterated_roberta_pipeline_xx.md new file mode 100644 index 00000000000000..34fc4bc2aab69a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-interlingua_multilingual_transliterated_roberta_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual interlingua_multilingual_transliterated_roberta_pipeline pipeline RoBertaEmbeddings from ibm +author: John Snow Labs +name: interlingua_multilingual_transliterated_roberta_pipeline +date: 2024-09-17 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`interlingua_multilingual_transliterated_roberta_pipeline` is a Multilingual model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/interlingua_multilingual_transliterated_roberta_pipeline_xx_5.5.0_3.0_1726595983328.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/interlingua_multilingual_transliterated_roberta_pipeline_xx_5.5.0_3.0_1726595983328.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("interlingua_multilingual_transliterated_roberta_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("interlingua_multilingual_transliterated_roberta_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|interlingua_multilingual_transliterated_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|638.6 MB| + +## References + +https://huggingface.co/ibm/ia-multilingual-transliterated-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_gw_chinese_tonga_tonga_islands_english_accelerate_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_gw_chinese_tonga_tonga_islands_english_accelerate_pipeline_en.md new file mode 100644 index 00000000000000..025ceb0fe8146e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_gw_chinese_tonga_tonga_islands_english_accelerate_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marian_finetuned_gw_chinese_tonga_tonga_islands_english_accelerate_pipeline pipeline MarianTransformer from Pinkky +author: John Snow Labs +name: marian_finetuned_gw_chinese_tonga_tonga_islands_english_accelerate_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_gw_chinese_tonga_tonga_islands_english_accelerate_pipeline` is a English model originally trained by Pinkky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_gw_chinese_tonga_tonga_islands_english_accelerate_pipeline_en_5.5.0_3.0_1726582050985.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_gw_chinese_tonga_tonga_islands_english_accelerate_pipeline_en_5.5.0_3.0_1726582050985.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marian_finetuned_gw_chinese_tonga_tonga_islands_english_accelerate_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marian_finetuned_gw_chinese_tonga_tonga_islands_english_accelerate_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_gw_chinese_tonga_tonga_islands_english_accelerate_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|540.8 MB| + +## References + +https://huggingface.co/Pinkky/marian-finetuned-gw-zh-to-en-accelerate + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_kde4_english_tonga_tonga_islands_french_kingxue_en.md b/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_kde4_english_tonga_tonga_islands_french_kingxue_en.md new file mode 100644 index 00000000000000..5913d48474e17f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-marian_finetuned_kde4_english_tonga_tonga_islands_french_kingxue_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marian_finetuned_kde4_english_tonga_tonga_islands_french_kingxue MarianTransformer from kingxue +author: John Snow Labs +name: marian_finetuned_kde4_english_tonga_tonga_islands_french_kingxue +date: 2024-09-17 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marian_finetuned_kde4_english_tonga_tonga_islands_french_kingxue` is a English model originally trained by kingxue. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_kingxue_en_5.5.0_3.0_1726599033403.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marian_finetuned_kde4_english_tonga_tonga_islands_french_kingxue_en_5.5.0_3.0_1726599033403.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_kingxue","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("marian_finetuned_kde4_english_tonga_tonga_islands_french_kingxue","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marian_finetuned_kde4_english_tonga_tonga_islands_french_kingxue| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|508.3 MB| + +## References + +https://huggingface.co/kingxue/marian-finetuned-kde4-en-to-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_pipeline_en.md new file mode 100644 index 00000000000000..5b0fd34dfc906d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_pipeline pipeline MarianTransformer from maaaaaa1 +author: John Snow Labs +name: opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_pipeline` is a English model originally trained by maaaaaa1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_pipeline_en_5.5.0_3.0_1726533358971.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_pipeline_en_5.5.0_3.0_1726533358971.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_spanish_finetuned_english_tonga_tonga_islands_spanish_maaaaaa1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|540.6 MB| + +## References + +https://huggingface.co/maaaaaa1/opus-mt-en-es-finetuned-en-to-es + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-sent_parlbert_german_law_de.md b/docs/_posts/ahmedlone127/2024-09-17-sent_parlbert_german_law_de.md new file mode 100644 index 00000000000000..d3207bebb2ea98 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-sent_parlbert_german_law_de.md @@ -0,0 +1,94 @@ +--- +layout: model +title: German sent_parlbert_german_law BertSentenceEmbeddings from InfAI +author: John Snow Labs +name: sent_parlbert_german_law +date: 2024-09-17 +tags: [de, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_parlbert_german_law` is a German model originally trained by InfAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_parlbert_german_law_de_5.5.0_3.0_1726607250426.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_parlbert_german_law_de_5.5.0_3.0_1726607250426.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_parlbert_german_law","de") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_parlbert_german_law","de") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_parlbert_german_law| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|de| +|Size:|406.8 MB| + +## References + +https://huggingface.co/InfAI/parlbert-german-law \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-translator_italian_english_en.md b/docs/_posts/ahmedlone127/2024-09-17-translator_italian_english_en.md new file mode 100644 index 00000000000000..5e578e277b69fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-translator_italian_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English translator_italian_english MarianTransformer from zaneas +author: John Snow Labs +name: translator_italian_english +date: 2024-09-17 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`translator_italian_english` is a English model originally trained by zaneas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/translator_italian_english_en_5.5.0_3.0_1726582222863.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/translator_italian_english_en_5.5.0_3.0_1726582222863.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("translator_italian_english","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("translator_italian_english","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|translator_italian_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|625.0 MB| + +## References + +https://huggingface.co/zaneas/translator_IT_EN \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_base_vtlustos_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_vtlustos_pipeline_en.md new file mode 100644 index 00000000000000..11fffcb2e19308 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_base_vtlustos_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_vtlustos_pipeline pipeline WhisperForCTC from vtlustos +author: John Snow Labs +name: whisper_base_vtlustos_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_vtlustos_pipeline` is a English model originally trained by vtlustos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_vtlustos_pipeline_en_5.5.0_3.0_1726568333577.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_vtlustos_pipeline_en_5.5.0_3.0_1726568333577.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_vtlustos_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_vtlustos_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_vtlustos_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|643.7 MB| + +## References + +https://huggingface.co/vtlustos/whisper-base + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-whisper_small_vietmed_free_e3_11_pipeline_vi.md b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_vietmed_free_e3_11_pipeline_vi.md new file mode 100644 index 00000000000000..6f2c074894baf3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-whisper_small_vietmed_free_e3_11_pipeline_vi.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Vietnamese whisper_small_vietmed_free_e3_11_pipeline pipeline WhisperForCTC from Hanhpt23 +author: John Snow Labs +name: whisper_small_vietmed_free_e3_11_pipeline +date: 2024-09-17 +tags: [vi, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: vi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_vietmed_free_e3_11_pipeline` is a Vietnamese model originally trained by Hanhpt23. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_vietmed_free_e3_11_pipeline_vi_5.5.0_3.0_1726565568518.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_vietmed_free_e3_11_pipeline_vi_5.5.0_3.0_1726565568518.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_vietmed_free_e3_11_pipeline", lang = "vi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_vietmed_free_e3_11_pipeline", lang = "vi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_vietmed_free_e3_11_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|vi| +|Size:|1.6 GB| + +## References + +https://huggingface.co/Hanhpt23/whisper-small-vietmed-free_E3-11 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_all_param_mehta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_all_param_mehta_pipeline_en.md new file mode 100644 index 00000000000000..ea291a5160934d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_all_param_mehta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_param_mehta_pipeline pipeline XlmRoBertaForTokenClassification from param-mehta +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_param_mehta_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_param_mehta_pipeline` is a English model originally trained by param-mehta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_param_mehta_pipeline_en_5.5.0_3.0_1726577136893.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_param_mehta_pipeline_en_5.5.0_3.0_1726577136893.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_param_mehta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_param_mehta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_param_mehta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|849.3 MB| + +## References + +https://huggingface.co/param-mehta/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_anas1997_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_anas1997_pipeline_en.md new file mode 100644 index 00000000000000..0e48de9c162dc2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_finetuned_panx_german_anas1997_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_anas1997_pipeline pipeline XlmRoBertaForTokenClassification from Anas1997 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_anas1997_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_anas1997_pipeline` is a English model originally trained by Anas1997. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_anas1997_pipeline_en_5.5.0_3.0_1726611175679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_anas1997_pipeline_en_5.5.0_3.0_1726611175679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_anas1997_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_anas1997_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_anas1997_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/Anas1997/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_lr2e_05_seed42_kinyarwanda_amh_eng_train_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_lr2e_05_seed42_kinyarwanda_amh_eng_train_en.md new file mode 100644 index 00000000000000..db525bc34b1e91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_base_lr2e_05_seed42_kinyarwanda_amh_eng_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_lr2e_05_seed42_kinyarwanda_amh_eng_train XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr2e_05_seed42_kinyarwanda_amh_eng_train +date: 2024-09-17 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr2e_05_seed42_kinyarwanda_amh_eng_train` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr2e_05_seed42_kinyarwanda_amh_eng_train_en_5.5.0_3.0_1726616327412.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr2e_05_seed42_kinyarwanda_amh_eng_train_en_5.5.0_3.0_1726616327412.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr2e_05_seed42_kinyarwanda_amh_eng_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr2e_05_seed42_kinyarwanda_amh_eng_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr2e_05_seed42_kinyarwanda_amh_eng_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|802.9 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr2e-05_seed42_kin-amh-eng_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_finetuned_emojis_cen_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_finetuned_emojis_cen_2_pipeline_en.md new file mode 100644 index 00000000000000..4d9c116a28dba3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-17-xlm_roberta_finetuned_emojis_cen_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_finetuned_emojis_cen_2_pipeline pipeline XlmRoBertaForSequenceClassification from Karim-Gamal +author: John Snow Labs +name: xlm_roberta_finetuned_emojis_cen_2_pipeline +date: 2024-09-17 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_finetuned_emojis_cen_2_pipeline` is a English model originally trained by Karim-Gamal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_cen_2_pipeline_en_5.5.0_3.0_1726536719294.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_finetuned_emojis_cen_2_pipeline_en_5.5.0_3.0_1726536719294.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_finetuned_emojis_cen_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_finetuned_emojis_cen_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_finetuned_emojis_cen_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Karim-Gamal/XLM-Roberta-finetuned-emojis-cen-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-1104_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-1104_pipeline_en.md new file mode 100644 index 00000000000000..bdf8866f90c39e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-1104_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 1104_pipeline pipeline DistilBertForSequenceClassification from tingchih +author: John Snow Labs +name: 1104_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`1104_pipeline` is a English model originally trained by tingchih. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/1104_pipeline_en_5.5.0_3.0_1726625528138.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/1104_pipeline_en_5.5.0_3.0_1726625528138.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("1104_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("1104_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|1104_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tingchih/1104 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-berit_2000_enriched_optimized_en.md b/docs/_posts/ahmedlone127/2024-09-18-berit_2000_enriched_optimized_en.md new file mode 100644 index 00000000000000..6db665e18e1557 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-berit_2000_enriched_optimized_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English berit_2000_enriched_optimized RoBertaEmbeddings from gngpostalsrvc +author: John Snow Labs +name: berit_2000_enriched_optimized +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`berit_2000_enriched_optimized` is a English model originally trained by gngpostalsrvc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/berit_2000_enriched_optimized_en_5.5.0_3.0_1726678777982.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/berit_2000_enriched_optimized_en_5.5.0_3.0_1726678777982.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("berit_2000_enriched_optimized","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("berit_2000_enriched_optimized","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|berit_2000_enriched_optimized| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|464.9 MB| + +## References + +https://huggingface.co/gngpostalsrvc/BERiT_2000_enriched_optimized \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-bert_base_cased_qqp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-bert_base_cased_qqp_pipeline_en.md new file mode 100644 index 00000000000000..6c6b47d1129b99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-bert_base_cased_qqp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_cased_qqp_pipeline pipeline BertForSequenceClassification from WillHeld +author: John Snow Labs +name: bert_base_cased_qqp_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_qqp_pipeline` is a English model originally trained by WillHeld. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_qqp_pipeline_en_5.5.0_3.0_1726623676136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_qqp_pipeline_en_5.5.0_3.0_1726623676136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_qqp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_qqp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_qqp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/WillHeld/bert-base-cased-qqp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_sklug_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_sklug_pipeline_en.md new file mode 100644 index 00000000000000..570f48b889f3a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-burmese_awesome_model_sklug_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_sklug_pipeline pipeline DistilBertForSequenceClassification from sklug +author: John Snow Labs +name: burmese_awesome_model_sklug_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_sklug_pipeline` is a English model originally trained by sklug. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_sklug_pipeline_en_5.5.0_3.0_1726630283720.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_sklug_pipeline_en_5.5.0_3.0_1726630283720.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_sklug_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_sklug_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_sklug_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sklug/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-clasificadormotivomora10_en.md b/docs/_posts/ahmedlone127/2024-09-18-clasificadormotivomora10_en.md new file mode 100644 index 00000000000000..eb4334f4ce8aca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-clasificadormotivomora10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English clasificadormotivomora10 RoBertaForSequenceClassification from Arodrigo +author: John Snow Labs +name: clasificadormotivomora10 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clasificadormotivomora10` is a English model originally trained by Arodrigo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clasificadormotivomora10_en_5.5.0_3.0_1726627794340.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clasificadormotivomora10_en_5.5.0_3.0_1726627794340.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("clasificadormotivomora10","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("clasificadormotivomora10", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clasificadormotivomora10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|464.5 MB| + +## References + +https://huggingface.co/Arodrigo/ClasificadorMotivoMora10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-clinicalbertqa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-clinicalbertqa_pipeline_en.md new file mode 100644 index 00000000000000..7545282372533f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-clinicalbertqa_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English clinicalbertqa_pipeline pipeline BertForQuestionAnswering from lanzv +author: John Snow Labs +name: clinicalbertqa_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinicalbertqa_pipeline` is a English model originally trained by lanzv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinicalbertqa_pipeline_en_5.5.0_3.0_1726667918468.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinicalbertqa_pipeline_en_5.5.0_3.0_1726667918468.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("clinicalbertqa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("clinicalbertqa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinicalbertqa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.3 MB| + +## References + +https://huggingface.co/lanzv/ClinicalBERTQA + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr24_seed1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr24_seed1_pipeline_en.md new file mode 100644 index 00000000000000..4b5c7fae850890 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-cold_fusion_itr24_seed1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cold_fusion_itr24_seed1_pipeline pipeline RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr24_seed1_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr24_seed1_pipeline` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr24_seed1_pipeline_en_5.5.0_3.0_1726628210809.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr24_seed1_pipeline_en_5.5.0_3.0_1726628210809.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cold_fusion_itr24_seed1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cold_fusion_itr24_seed1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr24_seed1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr24-seed1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_enneagram_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_enneagram_pipeline_en.md new file mode 100644 index 00000000000000..717347c3f530e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_enneagram_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_enneagram_pipeline pipeline DistilBertForSequenceClassification from LandersonMiguel +author: John Snow Labs +name: distilbert_base_uncased_enneagram_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_enneagram_pipeline` is a English model originally trained by LandersonMiguel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_enneagram_pipeline_en_5.5.0_3.0_1726669989474.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_enneagram_pipeline_en_5.5.0_3.0_1726669989474.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_enneagram_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_enneagram_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_enneagram_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LandersonMiguel/distilbert-base-uncased-enneagram + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_cheng_cherry_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_cheng_cherry_pipeline_en.md new file mode 100644 index 00000000000000..dc51c4c43aa8fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_clinc_cheng_cherry_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_cheng_cherry_pipeline pipeline DistilBertForSequenceClassification from cheng-cherry +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_cheng_cherry_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_cheng_cherry_pipeline` is a English model originally trained by cheng-cherry. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_cheng_cherry_pipeline_en_5.5.0_3.0_1726681591363.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_cheng_cherry_pipeline_en_5.5.0_3.0_1726681591363.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_cheng_cherry_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_cheng_cherry_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_cheng_cherry_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/cheng-cherry/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_santosale_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_santosale_pipeline_en.md new file mode 100644 index 00000000000000..5d6f0b8afb6d96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_cola_santosale_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_santosale_pipeline pipeline DistilBertForSequenceClassification from santosale +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_santosale_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_santosale_pipeline` is a English model originally trained by santosale. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_santosale_pipeline_en_5.5.0_3.0_1726676766725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_santosale_pipeline_en_5.5.0_3.0_1726676766725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_santosale_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_santosale_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_santosale_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/santosale/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_pipeline_en.md new file mode 100644 index 00000000000000..bd863e97dcc143 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_pipeline pipeline DistilBertForSequenceClassification from arvindsinghmanhas +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_pipeline` is a English model originally trained by arvindsinghmanhas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_pipeline_en_5.5.0_3.0_1726680412193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_pipeline_en_5.5.0_3.0_1726680412193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_arvindsinghmanhas_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/arvindsinghmanhas/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_helloyeew_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_helloyeew_en.md new file mode 100644 index 00000000000000..37f39938f0cf22 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_helloyeew_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_helloyeew DistilBertForSequenceClassification from helloyeew +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_helloyeew +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_helloyeew` is a English model originally trained by helloyeew. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_helloyeew_en_5.5.0_3.0_1726696432771.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_helloyeew_en_5.5.0_3.0_1726696432771.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_helloyeew","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_helloyeew", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_helloyeew| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/helloyeew/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_ujjwalgarg_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_ujjwalgarg_en.md new file mode 100644 index 00000000000000..a1a953843fccfd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_emotion_ujjwalgarg_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ujjwalgarg DistilBertForSequenceClassification from ujjwalgarg +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ujjwalgarg +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ujjwalgarg` is a English model originally trained by ujjwalgarg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ujjwalgarg_en_5.5.0_3.0_1726695150871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ujjwalgarg_en_5.5.0_3.0_1726695150871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ujjwalgarg","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_ujjwalgarg", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ujjwalgarg| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ujjwalgarg/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_d5716d28_simon580803_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_d5716d28_simon580803_en.md new file mode 100644 index 00000000000000..c8e6483ffafa4e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_d5716d28_simon580803_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_d5716d28_simon580803 DistilBertForQuestionAnswering from Simon580803 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_d5716d28_simon580803 +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_d5716d28_simon580803` is a English model originally trained by Simon580803. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_simon580803_en_5.5.0_3.0_1726644152760.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_d5716d28_simon580803_en_5.5.0_3.0_1726644152760.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_d5716d28_simon580803","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_uncased_finetuned_squad_d5716d28_simon580803", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_d5716d28_simon580803| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Simon580803/distilbert-base-uncased-finetuned-squad-d5716d28 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_sanghakoh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_sanghakoh_pipeline_en.md new file mode 100644 index 00000000000000..89805e18d56d24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_finetuned_squad_sanghakoh_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_squad_sanghakoh_pipeline pipeline DistilBertForQuestionAnswering from sanghakoh +author: John Snow Labs +name: distilbert_base_uncased_finetuned_squad_sanghakoh_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_squad_sanghakoh_pipeline` is a English model originally trained by sanghakoh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_sanghakoh_pipeline_en_5.5.0_3.0_1726640697607.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_squad_sanghakoh_pipeline_en_5.5.0_3.0_1726640697607.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_squad_sanghakoh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_squad_sanghakoh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_squad_sanghakoh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/sanghakoh/distilbert-base-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_pipeline_en.md new file mode 100644 index 00000000000000..d02c70cd67226d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_pipeline_en_5.5.0_3.0_1726680433504.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_pipeline_en_5.5.0_3.0_1726680433504.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st10sd_ut72ut1large10pfxnf_simsp400_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st10sd_ut72ut1large10PfxNf_simsp400_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st1sd_ut12ut1_plprefix0stlarge_simsp100_en.md b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st1sd_ut12ut1_plprefix0stlarge_simsp100_en.md new file mode 100644 index 00000000000000..26b29d7b36529c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-distilbert_base_uncased_odm_zphr_0st1sd_ut12ut1_plprefix0stlarge_simsp100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st1sd_ut12ut1_plprefix0stlarge_simsp100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st1sd_ut12ut1_plprefix0stlarge_simsp100 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st1sd_ut12ut1_plprefix0stlarge_simsp100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st1sd_ut12ut1_plprefix0stlarge_simsp100_en_5.5.0_3.0_1726681925088.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st1sd_ut12ut1_plprefix0stlarge_simsp100_en_5.5.0_3.0_1726681925088.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st1sd_ut12ut1_plprefix0stlarge_simsp100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st1sd_ut12ut1_plprefix0stlarge_simsp100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st1sd_ut12ut1_plprefix0stlarge_simsp100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st1sd_ut12ut1_PLPrefix0stlarge_simsp100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_bberken_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_bberken_en.md new file mode 100644 index 00000000000000..1c7873a6afb5ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_3000_samples_bberken_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_bberken DistilBertForSequenceClassification from bberken +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_bberken +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_bberken` is a English model originally trained by bberken. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_bberken_en_5.5.0_3.0_1726680609942.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_bberken_en_5.5.0_3.0_1726680609942.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_bberken","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_bberken", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_bberken| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/bberken/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_social_media_en.md b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_social_media_en.md new file mode 100644 index 00000000000000..136cf5a66e4e50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finetuning_sentiment_model_social_media_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_social_media DistilBertForSequenceClassification from MariaChzhen +author: John Snow Labs +name: finetuning_sentiment_model_social_media +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_social_media` is a English model originally trained by MariaChzhen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_social_media_en_5.5.0_3.0_1726625253472.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_social_media_en_5.5.0_3.0_1726625253472.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_social_media","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_social_media", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_social_media| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MariaChzhen/finetuning-sentiment-model-social-media \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-finnews_sentimentanalysis_v3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-finnews_sentimentanalysis_v3_pipeline_en.md new file mode 100644 index 00000000000000..4d71f04cf68fce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-finnews_sentimentanalysis_v3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finnews_sentimentanalysis_v3_pipeline pipeline DistilBertForSequenceClassification from ZephyruSalsify +author: John Snow Labs +name: finnews_sentimentanalysis_v3_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finnews_sentimentanalysis_v3_pipeline` is a English model originally trained by ZephyruSalsify. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finnews_sentimentanalysis_v3_pipeline_en_5.5.0_3.0_1726681580654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finnews_sentimentanalysis_v3_pipeline_en_5.5.0_3.0_1726681580654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finnews_sentimentanalysis_v3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finnews_sentimentanalysis_v3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finnews_sentimentanalysis_v3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/ZephyruSalsify/FinNews_SentimentAnalysis_v3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-mrlincolnberta_en.md b/docs/_posts/ahmedlone127/2024-09-18-mrlincolnberta_en.md new file mode 100644 index 00000000000000..a3db79bc219652 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-mrlincolnberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mrlincolnberta RoBertaEmbeddings from BigSalmon +author: John Snow Labs +name: mrlincolnberta +date: 2024-09-18 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mrlincolnberta` is a English model originally trained by BigSalmon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mrlincolnberta_en_5.5.0_3.0_1726651329667.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mrlincolnberta_en_5.5.0_3.0_1726651329667.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("mrlincolnberta","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("mrlincolnberta","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mrlincolnberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/BigSalmon/MrLincolnBerta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_epoch_73_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_epoch_73_pipeline_en.md new file mode 100644 index 00000000000000..7eb2c18a84429e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_epoch_73_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_73_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_73_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_73_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_73_pipeline_en_5.5.0_3.0_1726651997394.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_73_pipeline_en_5.5.0_3.0_1726651997394.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_73_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_73_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_73_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_73 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_base_pretrained_marathi_marh_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_pretrained_marathi_marh_2_pipeline_en.md new file mode 100644 index 00000000000000..eb3d6b881d95a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_base_pretrained_marathi_marh_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_pretrained_marathi_marh_2_pipeline pipeline RoBertaEmbeddings from DeadBeast +author: John Snow Labs +name: roberta_base_pretrained_marathi_marh_2_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_pretrained_marathi_marh_2_pipeline` is a English model originally trained by DeadBeast. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_pretrained_marathi_marh_2_pipeline_en_5.5.0_3.0_1726651640689.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_pretrained_marathi_marh_2_pipeline_en_5.5.0_3.0_1726651640689.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_pretrained_marathi_marh_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_pretrained_marathi_marh_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_pretrained_marathi_marh_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/DeadBeast/roberta-base-pretrained-mr-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_english_annualreport_tuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_english_annualreport_tuned_pipeline_en.md new file mode 100644 index 00000000000000..e230b55f16cffe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_english_annualreport_tuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_english_annualreport_tuned_pipeline pipeline RoBertaEmbeddings from CCCCC5 +author: John Snow Labs +name: roberta_english_annualreport_tuned_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_english_annualreport_tuned_pipeline` is a English model originally trained by CCCCC5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_english_annualreport_tuned_pipeline_en_5.5.0_3.0_1726678865481.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_english_annualreport_tuned_pipeline_en_5.5.0_3.0_1726678865481.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_english_annualreport_tuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_english_annualreport_tuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_english_annualreport_tuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|425.5 MB| + +## References + +https://huggingface.co/CCCCC5/RoBERTa_English_AnnualReport_tuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roberta_mrqa_v2_en.md b/docs/_posts/ahmedlone127/2024-09-18-roberta_mrqa_v2_en.md new file mode 100644 index 00000000000000..59fac57a649f60 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roberta_mrqa_v2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English roberta_mrqa_v2 RoBertaForQuestionAnswering from enriquesaou +author: John Snow Labs +name: roberta_mrqa_v2 +date: 2024-09-18 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_mrqa_v2` is a English model originally trained by enriquesaou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_mrqa_v2_en_5.5.0_3.0_1726619683795.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_mrqa_v2_en_5.5.0_3.0_1726619683795.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_mrqa_v2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_mrqa_v2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_mrqa_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|464.2 MB| + +## References + +https://huggingface.co/enriquesaou/roberta_mrqa_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-roebrta_base_val_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-roebrta_base_val_test_pipeline_en.md new file mode 100644 index 00000000000000..8391a6bc6e906e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-roebrta_base_val_test_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roebrta_base_val_test_pipeline pipeline RoBertaEmbeddings from Emanuel +author: John Snow Labs +name: roebrta_base_val_test_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roebrta_base_val_test_pipeline` is a English model originally trained by Emanuel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roebrta_base_val_test_pipeline_en_5.5.0_3.0_1726678055423.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roebrta_base_val_test_pipeline_en_5.5.0_3.0_1726678055423.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roebrta_base_val_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roebrta_base_val_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roebrta_base_val_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/Emanuel/roebrta-base-val-test + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_bert_based_ner_models_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_based_ner_models_pipeline_en.md new file mode 100644 index 00000000000000..e625036bb2f7ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_bert_based_ner_models_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_based_ner_models_pipeline pipeline BertSentenceEmbeddings from pragnakalp +author: John Snow Labs +name: sent_bert_based_ner_models_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_based_ner_models_pipeline` is a English model originally trained by pragnakalp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_based_ner_models_pipeline_en_5.5.0_3.0_1726661674559.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_based_ner_models_pipeline_en_5.5.0_3.0_1726661674559.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_based_ner_models_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_based_ner_models_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_based_ner_models_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.2 MB| + +## References + +https://huggingface.co/pragnakalp/bert_based_ner_models + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_distilbert_finetuned_imdb_neural_net_rahul_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_distilbert_finetuned_imdb_neural_net_rahul_en.md new file mode 100644 index 00000000000000..14c002da4e1cb3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_distilbert_finetuned_imdb_neural_net_rahul_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_distilbert_finetuned_imdb_neural_net_rahul BertSentenceEmbeddings from neural-net-rahul +author: John Snow Labs +name: sent_distilbert_finetuned_imdb_neural_net_rahul +date: 2024-09-18 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_distilbert_finetuned_imdb_neural_net_rahul` is a English model originally trained by neural-net-rahul. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_distilbert_finetuned_imdb_neural_net_rahul_en_5.5.0_3.0_1726676342331.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_distilbert_finetuned_imdb_neural_net_rahul_en_5.5.0_3.0_1726676342331.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_distilbert_finetuned_imdb_neural_net_rahul","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_distilbert_finetuned_imdb_neural_net_rahul","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_distilbert_finetuned_imdb_neural_net_rahul| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/neural-net-rahul/distilbert-finetuned-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-sent_esmlmt60_10000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-sent_esmlmt60_10000_pipeline_en.md new file mode 100644 index 00000000000000..b1c5efd43e78d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-sent_esmlmt60_10000_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_esmlmt60_10000_pipeline pipeline BertSentenceEmbeddings from hjkim811 +author: John Snow Labs +name: sent_esmlmt60_10000_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_esmlmt60_10000_pipeline` is a English model originally trained by hjkim811. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_esmlmt60_10000_pipeline_en_5.5.0_3.0_1726675897096.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_esmlmt60_10000_pipeline_en_5.5.0_3.0_1726675897096.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_esmlmt60_10000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_esmlmt60_10000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_esmlmt60_10000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.2 MB| + +## References + +https://huggingface.co/hjkim811/esmlmt60-10000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-test_finetuned__roberta_base_bne__augmented_ultrasounds_ner_en.md b/docs/_posts/ahmedlone127/2024-09-18-test_finetuned__roberta_base_bne__augmented_ultrasounds_ner_en.md new file mode 100644 index 00000000000000..402f8d17e7cb40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-test_finetuned__roberta_base_bne__augmented_ultrasounds_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_finetuned__roberta_base_bne__augmented_ultrasounds_ner RoBertaForTokenClassification from manucos +author: John Snow Labs +name: test_finetuned__roberta_base_bne__augmented_ultrasounds_ner +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_finetuned__roberta_base_bne__augmented_ultrasounds_ner` is a English model originally trained by manucos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_finetuned__roberta_base_bne__augmented_ultrasounds_ner_en_5.5.0_3.0_1726652421147.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_finetuned__roberta_base_bne__augmented_ultrasounds_ner_en_5.5.0_3.0_1726652421147.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("test_finetuned__roberta_base_bne__augmented_ultrasounds_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("test_finetuned__roberta_base_bne__augmented_ultrasounds_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_finetuned__roberta_base_bne__augmented_ultrasounds_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|463.7 MB| + +## References + +https://huggingface.co/manucos/test-finetuned__roberta-base-bne__augmented-ultrasounds-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-tmp_trainer_moisesdiazm_en.md b/docs/_posts/ahmedlone127/2024-09-18-tmp_trainer_moisesdiazm_en.md new file mode 100644 index 00000000000000..84e1cb4d5786f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-tmp_trainer_moisesdiazm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tmp_trainer_moisesdiazm DistilBertForSequenceClassification from moisesdiazm +author: John Snow Labs +name: tmp_trainer_moisesdiazm +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tmp_trainer_moisesdiazm` is a English model originally trained by moisesdiazm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tmp_trainer_moisesdiazm_en_5.5.0_3.0_1726696111952.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tmp_trainer_moisesdiazm_en_5.5.0_3.0_1726696111952.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("tmp_trainer_moisesdiazm","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("tmp_trainer_moisesdiazm", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tmp_trainer_moisesdiazm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/moisesdiazm/tmp_trainer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-topic_topic_random3_seed2_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-18-topic_topic_random3_seed2_bernice_en.md new file mode 100644 index 00000000000000..f22860de3c59c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-topic_topic_random3_seed2_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English topic_topic_random3_seed2_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random3_seed2_bernice +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random3_seed2_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random3_seed2_bernice_en_5.5.0_3.0_1726672577417.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random3_seed2_bernice_en_5.5.0_3.0_1726672577417.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("topic_topic_random3_seed2_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("topic_topic_random3_seed2_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random3_seed2_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|805.6 MB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random3_seed2-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-twitter_roberta_base_sentence_itr0_1e_05_all_01_03_2022_13_38_07_en.md b/docs/_posts/ahmedlone127/2024-09-18-twitter_roberta_base_sentence_itr0_1e_05_all_01_03_2022_13_38_07_en.md new file mode 100644 index 00000000000000..4c31999b670551 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-twitter_roberta_base_sentence_itr0_1e_05_all_01_03_2022_13_38_07_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twitter_roberta_base_sentence_itr0_1e_05_all_01_03_2022_13_38_07 RoBertaForSequenceClassification from ali2066 +author: John Snow Labs +name: twitter_roberta_base_sentence_itr0_1e_05_all_01_03_2022_13_38_07 +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_sentence_itr0_1e_05_all_01_03_2022_13_38_07` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_sentence_itr0_1e_05_all_01_03_2022_13_38_07_en_5.5.0_3.0_1726689606081.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_sentence_itr0_1e_05_all_01_03_2022_13_38_07_en_5.5.0_3.0_1726689606081.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("twitter_roberta_base_sentence_itr0_1e_05_all_01_03_2022_13_38_07","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("twitter_roberta_base_sentence_itr0_1e_05_all_01_03_2022_13_38_07", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_sentence_itr0_1e_05_all_01_03_2022_13_38_07| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/ali2066/twitter-roberta-base_sentence_itr0_1e-05_all_01_03_2022-13_38_07 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_edwardjross_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_edwardjross_en.md new file mode 100644 index 00000000000000..0ab82603e16c31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_edwardjross_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_edwardjross XlmRoBertaForTokenClassification from edwardjross +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_edwardjross +date: 2024-09-18 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_edwardjross` is a English model originally trained by edwardjross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_edwardjross_en_5.5.0_3.0_1726657361764.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_edwardjross_en_5.5.0_3.0_1726657361764.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_edwardjross","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_edwardjross", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_edwardjross| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|827.2 MB| + +## References + +https://huggingface.co/edwardjross/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_jbreunig_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_jbreunig_pipeline_en.md new file mode 100644 index 00000000000000..37595361794584 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_english_jbreunig_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_jbreunig_pipeline pipeline XlmRoBertaForTokenClassification from jbreunig +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_jbreunig_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_jbreunig_pipeline` is a English model originally trained by jbreunig. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_jbreunig_pipeline_en_5.5.0_3.0_1726636620953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_jbreunig_pipeline_en_5.5.0_3.0_1726636620953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_jbreunig_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_english_jbreunig_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_jbreunig_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/jbreunig/xlm-roberta-base-finetuned-panx-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_gus07ven_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_gus07ven_pipeline_en.md new file mode 100644 index 00000000000000..a2f04cda7b7ae7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_finetuned_panx_german_french_gus07ven_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_gus07ven_pipeline pipeline XlmRoBertaForTokenClassification from gus07ven +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_gus07ven_pipeline +date: 2024-09-18 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_gus07ven_pipeline` is a English model originally trained by gus07ven. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_gus07ven_pipeline_en_5.5.0_3.0_1726664077690.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_gus07ven_pipeline_en_5.5.0_3.0_1726664077690.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_gus07ven_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_gus07ven_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_gus07ven_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/gus07ven/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_imdb_muzammil_eds_en.md b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_imdb_muzammil_eds_en.md new file mode 100644 index 00000000000000..c0f8573175e218 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-xlm_roberta_base_imdb_muzammil_eds_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_imdb_muzammil_eds XlmRoBertaForSequenceClassification from muzammil-eds +author: John Snow Labs +name: xlm_roberta_base_imdb_muzammil_eds +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_imdb_muzammil_eds` is a English model originally trained by muzammil-eds. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_imdb_muzammil_eds_en_5.5.0_3.0_1726660831500.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_imdb_muzammil_eds_en_5.5.0_3.0_1726660831500.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_imdb_muzammil_eds","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_imdb_muzammil_eds", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_imdb_muzammil_eds| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|860.9 MB| + +## References + +https://huggingface.co/muzammil-eds/xlm-roberta-base-IMDB \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-18-yiyang_test_en.md b/docs/_posts/ahmedlone127/2024-09-18-yiyang_test_en.md new file mode 100644 index 00000000000000..3d533a886e0907 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-18-yiyang_test_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English yiyang_test BertForSequenceClassification from yiyang0101 +author: John Snow Labs +name: yiyang_test +date: 2024-09-18 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`yiyang_test` is a English model originally trained by yiyang0101. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/yiyang_test_en_5.5.0_3.0_1726624162910.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/yiyang_test_en_5.5.0_3.0_1726624162910.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("yiyang_test","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("yiyang_test", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|yiyang_test| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yiyang0101/yiyang-test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_few_shot_k_64_finetuned_squad_seed_8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_few_shot_k_64_finetuned_squad_seed_8_pipeline_en.md new file mode 100644 index 00000000000000..257bd25e40c630 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_few_shot_k_64_finetuned_squad_seed_8_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_few_shot_k_64_finetuned_squad_seed_8_pipeline pipeline BertForQuestionAnswering from anas-awadalla +author: John Snow Labs +name: bert_base_uncased_few_shot_k_64_finetuned_squad_seed_8_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_few_shot_k_64_finetuned_squad_seed_8_pipeline` is a English model originally trained by anas-awadalla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_few_shot_k_64_finetuned_squad_seed_8_pipeline_en_5.5.0_3.0_1726710470114.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_few_shot_k_64_finetuned_squad_seed_8_pipeline_en_5.5.0_3.0_1726710470114.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_few_shot_k_64_finetuned_squad_seed_8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_few_shot_k_64_finetuned_squad_seed_8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_few_shot_k_64_finetuned_squad_seed_8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/anas-awadalla/bert-base-uncased-few-shot-k-64-finetuned-squad-seed-8 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_bible_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_bible_pipeline_en.md new file mode 100644 index 00000000000000..2bb6fd1961ccb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_base_uncased_finetuned_bible_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_bible_pipeline pipeline BertEmbeddings from Pragash-Mohanarajah +author: John Snow Labs +name: bert_base_uncased_finetuned_bible_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_bible_pipeline` is a English model originally trained by Pragash-Mohanarajah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_bible_pipeline_en_5.5.0_3.0_1726717677937.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_bible_pipeline_en_5.5.0_3.0_1726717677937.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_bible_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_bible_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_bible_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Pragash-Mohanarajah/bert-base-uncased-finetuned-bible + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_finetuned_squad_ashaduzzaman_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_finetuned_squad_ashaduzzaman_pipeline_en.md new file mode 100644 index 00000000000000..9fb2e9f3cf2fca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_finetuned_squad_ashaduzzaman_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_finetuned_squad_ashaduzzaman_pipeline pipeline BertForQuestionAnswering from ashaduzzaman +author: John Snow Labs +name: bert_finetuned_squad_ashaduzzaman_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_squad_ashaduzzaman_pipeline` is a English model originally trained by ashaduzzaman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_ashaduzzaman_pipeline_en_5.5.0_3.0_1726765877820.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_ashaduzzaman_pipeline_en_5.5.0_3.0_1726765877820.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_squad_ashaduzzaman_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_squad_ashaduzzaman_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_squad_ashaduzzaman_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/ashaduzzaman/bert-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_l12_h256_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_l12_h256_uncased_pipeline_en.md new file mode 100644 index 00000000000000..f70508c8bceeeb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_l12_h256_uncased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_l12_h256_uncased_pipeline pipeline BertEmbeddings from gaunernst +author: John Snow Labs +name: bert_l12_h256_uncased_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_l12_h256_uncased_pipeline` is a English model originally trained by gaunernst. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_l12_h256_uncased_pipeline_en_5.5.0_3.0_1726744700213.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_l12_h256_uncased_pipeline_en_5.5.0_3.0_1726744700213.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_l12_h256_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_l12_h256_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_l12_h256_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|65.5 MB| + +## References + +https://huggingface.co/gaunernst/bert-L12-H256-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_two_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_two_en.md new file mode 100644 index 00000000000000..d845c5c6a86334 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_two_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_two BertEmbeddings from emma7897 +author: John Snow Labs +name: bert_two +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_two` is a English model originally trained by emma7897. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_two_en_5.5.0_3.0_1726744584268.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_two_en_5.5.0_3.0_1726744584268.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_two","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_two","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_two| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/emma7897/bert_two \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-bert_vllm_gemma2b_stringmatcher_newdataset_2_en.md b/docs/_posts/ahmedlone127/2024-09-19-bert_vllm_gemma2b_stringmatcher_newdataset_2_en.md new file mode 100644 index 00000000000000..d98939b614d987 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-bert_vllm_gemma2b_stringmatcher_newdataset_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_vllm_gemma2b_stringmatcher_newdataset_2 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_vllm_gemma2b_stringmatcher_newdataset_2 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_vllm_gemma2b_stringmatcher_newdataset_2` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_stringmatcher_newdataset_2_en_5.5.0_3.0_1726763564004.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_vllm_gemma2b_stringmatcher_newdataset_2_en_5.5.0_3.0_1726763564004.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_vllm_gemma2b_stringmatcher_newdataset_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_vllm_gemma2b_stringmatcher_newdataset_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_vllm_gemma2b_stringmatcher_newdataset_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_vllm-gemma2b-stringMatcher-newDataset_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-burmese_distilbert_imdb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-burmese_distilbert_imdb_pipeline_en.md new file mode 100644 index 00000000000000..495e519b72498f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-burmese_distilbert_imdb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_distilbert_imdb_pipeline pipeline DistilBertForSequenceClassification from nnhwin +author: John Snow Labs +name: burmese_distilbert_imdb_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_distilbert_imdb_pipeline` is a English model originally trained by nnhwin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_distilbert_imdb_pipeline_en_5.5.0_3.0_1726741101256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_distilbert_imdb_pipeline_en_5.5.0_3.0_1726741101256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_distilbert_imdb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_distilbert_imdb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_distilbert_imdb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/nnhwin/my-distilbert-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-common_voice_lt.md b/docs/_posts/ahmedlone127/2024-09-19-common_voice_lt.md new file mode 100644 index 00000000000000..91e0aadbfd3116 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-common_voice_lt.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Lithuanian common_voice WhisperForCTC from Tomas1234 +author: John Snow Labs +name: common_voice +date: 2024-09-19 +tags: [lt, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: lt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`common_voice` is a Lithuanian model originally trained by Tomas1234. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/common_voice_lt_5.5.0_3.0_1726757233936.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/common_voice_lt_5.5.0_3.0_1726757233936.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("common_voice","lt") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("common_voice", "lt") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|common_voice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|lt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Tomas1234/common_voice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distil_bert_v3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distil_bert_v3_pipeline_en.md new file mode 100644 index 00000000000000..cdbd03d090ab03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distil_bert_v3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distil_bert_v3_pipeline pipeline DistilBertForSequenceClassification from KayraAksit +author: John Snow Labs +name: distil_bert_v3_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distil_bert_v3_pipeline` is a English model originally trained by KayraAksit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distil_bert_v3_pipeline_en_5.5.0_3.0_1726744187131.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distil_bert_v3_pipeline_en_5.5.0_3.0_1726744187131.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distil_bert_v3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distil_bert_v3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distil_bert_v3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KayraAksit/distil_bert_v3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_cased_distilled_squad_finetuned_squad_test3_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_cased_distilled_squad_finetuned_squad_test3_en.md new file mode 100644 index 00000000000000..cb0bb0fcfc1a96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_cased_distilled_squad_finetuned_squad_test3_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English distilbert_base_cased_distilled_squad_finetuned_squad_test3 DistilBertForQuestionAnswering from allistair99 +author: John Snow Labs +name: distilbert_base_cased_distilled_squad_finetuned_squad_test3 +date: 2024-09-19 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_distilled_squad_finetuned_squad_test3` is a English model originally trained by allistair99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_distilled_squad_finetuned_squad_test3_en_5.5.0_3.0_1726727801103.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_distilled_squad_finetuned_squad_test3_en_5.5.0_3.0_1726727801103.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_cased_distilled_squad_finetuned_squad_test3","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("distilbert_base_cased_distilled_squad_finetuned_squad_test3", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_distilled_squad_finetuned_squad_test3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|243.8 MB| + +## References + +https://huggingface.co/allistair99/distilbert-base-cased-distilled-squad-finetuned-squad-test3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_aicoder009_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_aicoder009_en.md new file mode 100644 index 00000000000000..8997c2ac6eb8ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_aicoder009_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_aicoder009 DistilBertForSequenceClassification from AICODER009 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_aicoder009 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_aicoder009` is a English model originally trained by AICODER009. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_aicoder009_en_5.5.0_3.0_1726704675610.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_aicoder009_en_5.5.0_3.0_1726704675610.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_aicoder009","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_aicoder009", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_aicoder009| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AICODER009/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_cokuun_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_cokuun_pipeline_en.md new file mode 100644 index 00000000000000..ad3458116858c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_emotion_cokuun_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_cokuun_pipeline pipeline DistilBertForSequenceClassification from cokuun +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_cokuun_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_cokuun_pipeline` is a English model originally trained by cokuun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cokuun_pipeline_en_5.5.0_3.0_1726741248606.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cokuun_pipeline_en_5.5.0_3.0_1726741248606.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_cokuun_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_cokuun_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_cokuun_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cokuun/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_sst_2_english_toxicity_ft_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_sst_2_english_toxicity_ft_en.md new file mode 100644 index 00000000000000..5d50d12dc58a2e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_finetuned_sst_2_english_toxicity_ft_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_sst_2_english_toxicity_ft DistilBertForSequenceClassification from fatmhd1995 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_sst_2_english_toxicity_ft +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_sst_2_english_toxicity_ft` is a English model originally trained by fatmhd1995. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst_2_english_toxicity_ft_en_5.5.0_3.0_1726740692106.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_sst_2_english_toxicity_ft_en_5.5.0_3.0_1726740692106.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_sst_2_english_toxicity_ft","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_sst_2_english_toxicity_ft", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_sst_2_english_toxicity_ft| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/fatmhd1995/distilbert-base-uncased-finetuned-sst-2-english-TOXICITY-FT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_pipeline_en.md new file mode 100644 index 00000000000000..0ab39e2a10a667 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_pipeline_en_5.5.0_3.0_1726742693449.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_pipeline_en_5.5.0_3.0_1726742693449.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st7sd_ut72ut1large7pfxnf_simsp400_clean300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st7sd_ut72ut1large7PfxNf_simsp400_clean300 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_travel_zphr_1st_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_travel_zphr_1st_en.md new file mode 100644 index 00000000000000..1467fa63715e34 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_base_uncased_travel_zphr_1st_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_1st DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_1st +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_1st` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_1st_en_5.5.0_3.0_1726719402788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_1st_en_5.5.0_3.0_1726719402788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_1st","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_1st", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_1st| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_1st \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilbert_fine_tuned_rte_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilbert_fine_tuned_rte_en.md new file mode 100644 index 00000000000000..fba84d78a4ed11 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilbert_fine_tuned_rte_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_fine_tuned_rte DistilBertForSequenceClassification from rycecorn +author: John Snow Labs +name: distilbert_fine_tuned_rte +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_fine_tuned_rte` is a English model originally trained by rycecorn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_fine_tuned_rte_en_5.5.0_3.0_1726704234954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_fine_tuned_rte_en_5.5.0_3.0_1726704234954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_fine_tuned_rte","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_fine_tuned_rte", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_fine_tuned_rte| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rycecorn/DistilBert-fine-tuned-RTE \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-distilkobert_ep2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-distilkobert_ep2_pipeline_en.md new file mode 100644 index 00000000000000..af18809981227e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-distilkobert_ep2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilkobert_ep2_pipeline pipeline DistilBertForSequenceClassification from yeye776 +author: John Snow Labs +name: distilkobert_ep2_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilkobert_ep2_pipeline` is a English model originally trained by yeye776. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilkobert_ep2_pipeline_en_5.5.0_3.0_1726763338735.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilkobert_ep2_pipeline_en_5.5.0_3.0_1726763338735.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilkobert_ep2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilkobert_ep2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilkobert_ep2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|106.5 MB| + +## References + +https://huggingface.co/yeye776/DistilKoBERT-ep2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-emotion_predictor_for_emotion_chat_bot_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-emotion_predictor_for_emotion_chat_bot_pipeline_en.md new file mode 100644 index 00000000000000..a75b2972f85ef6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-emotion_predictor_for_emotion_chat_bot_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English emotion_predictor_for_emotion_chat_bot_pipeline pipeline RoBertaForSequenceClassification from Shotaro30678 +author: John Snow Labs +name: emotion_predictor_for_emotion_chat_bot_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`emotion_predictor_for_emotion_chat_bot_pipeline` is a English model originally trained by Shotaro30678. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/emotion_predictor_for_emotion_chat_bot_pipeline_en_5.5.0_3.0_1726726706574.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/emotion_predictor_for_emotion_chat_bot_pipeline_en_5.5.0_3.0_1726726706574.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("emotion_predictor_for_emotion_chat_bot_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("emotion_predictor_for_emotion_chat_bot_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|emotion_predictor_for_emotion_chat_bot_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.9 MB| + +## References + +https://huggingface.co/Shotaro30678/emotion_predictor_for_emotion_chat_bot + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-imdb_binary_classifier_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-19-imdb_binary_classifier_roberta_base_en.md new file mode 100644 index 00000000000000..1336d5199a25f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-imdb_binary_classifier_roberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English imdb_binary_classifier_roberta_base RoBertaForSequenceClassification from againeureka +author: John Snow Labs +name: imdb_binary_classifier_roberta_base +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdb_binary_classifier_roberta_base` is a English model originally trained by againeureka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdb_binary_classifier_roberta_base_en_5.5.0_3.0_1726725835175.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdb_binary_classifier_roberta_base_en_5.5.0_3.0_1726725835175.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("imdb_binary_classifier_roberta_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("imdb_binary_classifier_roberta_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdb_binary_classifier_roberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|464.2 MB| + +## References + +https://huggingface.co/againeureka/imdb_binary_classifier_roberta_base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-marathi_marh_val_dn_mr.md b/docs/_posts/ahmedlone127/2024-09-19-marathi_marh_val_dn_mr.md new file mode 100644 index 00000000000000..58e5a0bacffa3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-marathi_marh_val_dn_mr.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Marathi marathi_marh_val_dn WhisperForCTC from simran14 +author: John Snow Labs +name: marathi_marh_val_dn +date: 2024-09-19 +tags: [mr, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: mr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marathi_marh_val_dn` is a Marathi model originally trained by simran14. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marathi_marh_val_dn_mr_5.5.0_3.0_1726714260416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marathi_marh_val_dn_mr_5.5.0_3.0_1726714260416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("marathi_marh_val_dn","mr") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("marathi_marh_val_dn", "mr") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marathi_marh_val_dn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|mr| +|Size:|1.7 GB| + +## References + +https://huggingface.co/simran14/mr-val-dn \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-mlm_finetunedmodel_test_en.md b/docs/_posts/ahmedlone127/2024-09-19-mlm_finetunedmodel_test_en.md new file mode 100644 index 00000000000000..b63dce5a80ca56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-mlm_finetunedmodel_test_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mlm_finetunedmodel_test RoBertaEmbeddings from shradha01 +author: John Snow Labs +name: mlm_finetunedmodel_test +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mlm_finetunedmodel_test` is a English model originally trained by shradha01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mlm_finetunedmodel_test_en_5.5.0_3.0_1726746959343.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mlm_finetunedmodel_test_en_5.5.0_3.0_1726746959343.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("mlm_finetunedmodel_test","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("mlm_finetunedmodel_test","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mlm_finetunedmodel_test| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/shradha01/MLM_FinetunedModel_test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-n_roberta_imdb_padding90model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-n_roberta_imdb_padding90model_pipeline_en.md new file mode 100644 index 00000000000000..c7efe3e82b62f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-n_roberta_imdb_padding90model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_roberta_imdb_padding90model_pipeline pipeline RoBertaForSequenceClassification from Realgon +author: John Snow Labs +name: n_roberta_imdb_padding90model_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_roberta_imdb_padding90model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_roberta_imdb_padding90model_pipeline_en_5.5.0_3.0_1726780496853.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_roberta_imdb_padding90model_pipeline_en_5.5.0_3.0_1726780496853.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_roberta_imdb_padding90model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_roberta_imdb_padding90model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_roberta_imdb_padding90model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.3 MB| + +## References + +https://huggingface.co/Realgon/N_roberta_imdb_padding90model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-nerd_nerd_random2_seed0_twitter_roberta_base_2022_154m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-nerd_nerd_random2_seed0_twitter_roberta_base_2022_154m_pipeline_en.md new file mode 100644 index 00000000000000..c7c58d062f5bc3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-nerd_nerd_random2_seed0_twitter_roberta_base_2022_154m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nerd_nerd_random2_seed0_twitter_roberta_base_2022_154m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: nerd_nerd_random2_seed0_twitter_roberta_base_2022_154m_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nerd_nerd_random2_seed0_twitter_roberta_base_2022_154m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nerd_nerd_random2_seed0_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1726750603997.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nerd_nerd_random2_seed0_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1726750603997.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nerd_nerd_random2_seed0_twitter_roberta_base_2022_154m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nerd_nerd_random2_seed0_twitter_roberta_base_2022_154m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nerd_nerd_random2_seed0_twitter_roberta_base_2022_154m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/tweettemposhift/nerd-nerd_random2_seed0-twitter-roberta-base-2022-154m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-paludistilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-paludistilbert_pipeline_en.md new file mode 100644 index 00000000000000..697f09cdf43d1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-paludistilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English paludistilbert_pipeline pipeline DistilBertForSequenceClassification from Palu001 +author: John Snow Labs +name: paludistilbert_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`paludistilbert_pipeline` is a English model originally trained by Palu001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/paludistilbert_pipeline_en_5.5.0_3.0_1726742584799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/paludistilbert_pipeline_en_5.5.0_3.0_1726742584799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("paludistilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("paludistilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|paludistilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Palu001/PaluDistilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-polarizer_bert_base_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-19-polarizer_bert_base_uncased_en.md new file mode 100644 index 00000000000000..1076dde1ebadaa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-polarizer_bert_base_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English polarizer_bert_base_uncased BertEmbeddings from kyungmin011029 +author: John Snow Labs +name: polarizer_bert_base_uncased +date: 2024-09-19 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`polarizer_bert_base_uncased` is a English model originally trained by kyungmin011029. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/polarizer_bert_base_uncased_en_5.5.0_3.0_1726744755315.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/polarizer_bert_base_uncased_en_5.5.0_3.0_1726744755315.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("polarizer_bert_base_uncased","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("polarizer_bert_base_uncased","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|polarizer_bert_base_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/kyungmin011029/Polarizer-bert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-regr_2_en.md b/docs/_posts/ahmedlone127/2024-09-19-regr_2_en.md new file mode 100644 index 00000000000000..c87a45f6534d9b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-regr_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English regr_2 RoBertaForSequenceClassification from BaronSch +author: John Snow Labs +name: regr_2 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`regr_2` is a English model originally trained by BaronSch. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/regr_2_en_5.5.0_3.0_1726780180628.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/regr_2_en_5.5.0_3.0_1726780180628.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("regr_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("regr_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|regr_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/BaronSch/Regr_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_classifier_autonlp_fake_covid_news_36769078_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_classifier_autonlp_fake_covid_news_36769078_pipeline_en.md new file mode 100644 index 00000000000000..1cc85836d7b8e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_classifier_autonlp_fake_covid_news_36769078_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_classifier_autonlp_fake_covid_news_36769078_pipeline pipeline RoBertaForSequenceClassification from Qinghui +author: John Snow Labs +name: roberta_classifier_autonlp_fake_covid_news_36769078_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_classifier_autonlp_fake_covid_news_36769078_pipeline` is a English model originally trained by Qinghui. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_classifier_autonlp_fake_covid_news_36769078_pipeline_en_5.5.0_3.0_1726780097198.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_classifier_autonlp_fake_covid_news_36769078_pipeline_en_5.5.0_3.0_1726780097198.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_classifier_autonlp_fake_covid_news_36769078_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_classifier_autonlp_fake_covid_news_36769078_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_classifier_autonlp_fake_covid_news_36769078_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Qinghui/autonlp-fake-covid-news-36769078 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_clinical_wl_spanish_ner_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_clinical_wl_spanish_ner_en.md new file mode 100644 index 00000000000000..5686620f139ff0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_clinical_wl_spanish_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_clinical_wl_spanish_ner RoBertaForTokenClassification from manucos +author: John Snow Labs +name: roberta_clinical_wl_spanish_ner +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_clinical_wl_spanish_ner` is a English model originally trained by manucos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_clinical_wl_spanish_ner_en_5.5.0_3.0_1726729290278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_clinical_wl_spanish_ner_en_5.5.0_3.0_1726729290278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_clinical_wl_spanish_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_clinical_wl_spanish_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_clinical_wl_spanish_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|469.7 MB| + +## References + +https://huggingface.co/manucos/roberta-clinical-wl-es-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_clinical_wl_spanish_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_clinical_wl_spanish_ner_pipeline_en.md new file mode 100644 index 00000000000000..a44e92a566e7c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_clinical_wl_spanish_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_clinical_wl_spanish_ner_pipeline pipeline RoBertaForTokenClassification from manucos +author: John Snow Labs +name: roberta_clinical_wl_spanish_ner_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_clinical_wl_spanish_ner_pipeline` is a English model originally trained by manucos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_clinical_wl_spanish_ner_pipeline_en_5.5.0_3.0_1726729313429.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_clinical_wl_spanish_ner_pipeline_en_5.5.0_3.0_1726729313429.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_clinical_wl_spanish_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_clinical_wl_spanish_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_clinical_wl_spanish_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|469.8 MB| + +## References + +https://huggingface.co/manucos/roberta-clinical-wl-es-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_shopee_sentiment_gadgets_tl.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_shopee_sentiment_gadgets_tl.md new file mode 100644 index 00000000000000..8ba05eaf36f0ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_shopee_sentiment_gadgets_tl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Tagalog roberta_shopee_sentiment_gadgets RoBertaForSequenceClassification from magixxixx +author: John Snow Labs +name: roberta_shopee_sentiment_gadgets +date: 2024-09-19 +tags: [tl, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: tl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_shopee_sentiment_gadgets` is a Tagalog model originally trained by magixxixx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_shopee_sentiment_gadgets_tl_5.5.0_3.0_1726779745735.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_shopee_sentiment_gadgets_tl_5.5.0_3.0_1726779745735.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_shopee_sentiment_gadgets","tl") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_shopee_sentiment_gadgets", "tl") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_shopee_sentiment_gadgets| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|tl| +|Size:|409.3 MB| + +## References + +https://huggingface.co/magixxixx/roberta-shopee-sentiment-gadgets \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-roberta_untrained_1eps_seed995_en.md b/docs/_posts/ahmedlone127/2024-09-19-roberta_untrained_1eps_seed995_en.md new file mode 100644 index 00000000000000..81ca01fb01b08d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-roberta_untrained_1eps_seed995_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_untrained_1eps_seed995 RoBertaForSequenceClassification from custeau +author: John Snow Labs +name: roberta_untrained_1eps_seed995 +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_untrained_1eps_seed995` is a English model originally trained by custeau. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_untrained_1eps_seed995_en_5.5.0_3.0_1726779857199.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_untrained_1eps_seed995_en_5.5.0_3.0_1726779857199.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_untrained_1eps_seed995","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_untrained_1eps_seed995", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_untrained_1eps_seed995| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|447.8 MB| + +## References + +https://huggingface.co/custeau/roberta_untrained_1eps_seed995 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-robertabase_subjectivity_1_actual_en.md b/docs/_posts/ahmedlone127/2024-09-19-robertabase_subjectivity_1_actual_en.md new file mode 100644 index 00000000000000..e909c92bd39e5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-robertabase_subjectivity_1_actual_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robertabase_subjectivity_1_actual RoBertaForSequenceClassification from Muffins987 +author: John Snow Labs +name: robertabase_subjectivity_1_actual +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertabase_subjectivity_1_actual` is a English model originally trained by Muffins987. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertabase_subjectivity_1_actual_en_5.5.0_3.0_1726780212339.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertabase_subjectivity_1_actual_en_5.5.0_3.0_1726780212339.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("robertabase_subjectivity_1_actual","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("robertabase_subjectivity_1_actual", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertabase_subjectivity_1_actual| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|466.5 MB| + +## References + +https://huggingface.co/Muffins987/robertabase-subjectivity-1-actual \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-rubioroberta_neg_en.md b/docs/_posts/ahmedlone127/2024-09-19-rubioroberta_neg_en.md new file mode 100644 index 00000000000000..1ab30ef0dad4f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-rubioroberta_neg_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English rubioroberta_neg RoBertaForTokenClassification from DimasikKurd +author: John Snow Labs +name: rubioroberta_neg +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubioroberta_neg` is a English model originally trained by DimasikKurd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubioroberta_neg_en_5.5.0_3.0_1726731289370.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubioroberta_neg_en_5.5.0_3.0_1726731289370.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("rubioroberta_neg","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("rubioroberta_neg", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubioroberta_neg| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/DimasikKurd/RuBioRoBERTa_neg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-schem_roberta_text_disagreement_binary_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-schem_roberta_text_disagreement_binary_classifier_pipeline_en.md new file mode 100644 index 00000000000000..825148a4f53a4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-schem_roberta_text_disagreement_binary_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English schem_roberta_text_disagreement_binary_classifier_pipeline pipeline RoBertaForSequenceClassification from RuyuanWan +author: John Snow Labs +name: schem_roberta_text_disagreement_binary_classifier_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`schem_roberta_text_disagreement_binary_classifier_pipeline` is a English model originally trained by RuyuanWan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/schem_roberta_text_disagreement_binary_classifier_pipeline_en_5.5.0_3.0_1726733127519.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/schem_roberta_text_disagreement_binary_classifier_pipeline_en_5.5.0_3.0_1726733127519.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("schem_roberta_text_disagreement_binary_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("schem_roberta_text_disagreement_binary_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|schem_roberta_text_disagreement_binary_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|422.9 MB| + +## References + +https://huggingface.co/RuyuanWan/SChem_RoBERTa_Text_Disagreement_Binary_Classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sent_bert_large_ct_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-sent_bert_large_ct_pipeline_en.md new file mode 100644 index 00000000000000..0a466c6b65d9a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sent_bert_large_ct_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_large_ct_pipeline pipeline BertSentenceEmbeddings from Contrastive-Tension +author: John Snow Labs +name: sent_bert_large_ct_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_ct_pipeline` is a English model originally trained by Contrastive-Tension. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_ct_pipeline_en_5.5.0_3.0_1726728805430.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_ct_pipeline_en_5.5.0_3.0_1726728805430.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_large_ct_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_large_ct_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_ct_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Contrastive-Tension/BERT-Large-CT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sent_phs_bert_en.md b/docs/_posts/ahmedlone127/2024-09-19-sent_phs_bert_en.md new file mode 100644 index 00000000000000..89aece1b87571a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sent_phs_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_phs_bert BertSentenceEmbeddings from publichealthsurveillance +author: John Snow Labs +name: sent_phs_bert +date: 2024-09-19 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_phs_bert` is a English model originally trained by publichealthsurveillance. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_phs_bert_en_5.5.0_3.0_1726782711992.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_phs_bert_en_5.5.0_3.0_1726782711992.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_phs_bert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_phs_bert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_phs_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/publichealthsurveillance/PHS-BERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-sentiment_analysis_finetuning_en.md b/docs/_posts/ahmedlone127/2024-09-19-sentiment_analysis_finetuning_en.md new file mode 100644 index 00000000000000..b3d9457c704776 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-sentiment_analysis_finetuning_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_finetuning RoBertaForSequenceClassification from Asif1997 +author: John Snow Labs +name: sentiment_analysis_finetuning +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_finetuning` is a English model originally trained by Asif1997. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_finetuning_en_5.5.0_3.0_1726751205326.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_finetuning_en_5.5.0_3.0_1726751205326.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_analysis_finetuning","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiment_analysis_finetuning", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_finetuning| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|456.9 MB| + +## References + +https://huggingface.co/Asif1997/Sentiment-Analysis-Finetuning \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-trained_dilibert_sentiment_analysis_amolinab_en.md b/docs/_posts/ahmedlone127/2024-09-19-trained_dilibert_sentiment_analysis_amolinab_en.md new file mode 100644 index 00000000000000..9b51c8876a32d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-trained_dilibert_sentiment_analysis_amolinab_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trained_dilibert_sentiment_analysis_amolinab DistilBertForSequenceClassification from amolinab +author: John Snow Labs +name: trained_dilibert_sentiment_analysis_amolinab +date: 2024-09-19 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trained_dilibert_sentiment_analysis_amolinab` is a English model originally trained by amolinab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trained_dilibert_sentiment_analysis_amolinab_en_5.5.0_3.0_1726740889921.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trained_dilibert_sentiment_analysis_amolinab_en_5.5.0_3.0_1726740889921.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("trained_dilibert_sentiment_analysis_amolinab","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("trained_dilibert_sentiment_analysis_amolinab", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trained_dilibert_sentiment_analysis_amolinab| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/amolinab/trained_dilibert_sentiment_analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_base_indonesian_rizka_pipeline_id.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_base_indonesian_rizka_pipeline_id.md new file mode 100644 index 00000000000000..0a32fe682960f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_base_indonesian_rizka_pipeline_id.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Indonesian whisper_base_indonesian_rizka_pipeline pipeline WhisperForCTC from Rizka +author: John Snow Labs +name: whisper_base_indonesian_rizka_pipeline +date: 2024-09-19 +tags: [id, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_indonesian_rizka_pipeline` is a Indonesian model originally trained by Rizka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_indonesian_rizka_pipeline_id_5.5.0_3.0_1726759830534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_indonesian_rizka_pipeline_id_5.5.0_3.0_1726759830534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_indonesian_rizka_pipeline", lang = "id") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_indonesian_rizka_pipeline", lang = "id") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_indonesian_rizka_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|id| +|Size:|642.1 MB| + +## References + +https://huggingface.co/Rizka/whisper-base-id + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_minds14_english_chugyouk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_minds14_english_chugyouk_pipeline_en.md new file mode 100644 index 00000000000000..e32914a30699c8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-whisper_tiny_minds14_english_chugyouk_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_minds14_english_chugyouk_pipeline pipeline WhisperForCTC from ChuGyouk +author: John Snow Labs +name: whisper_tiny_minds14_english_chugyouk_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_english_chugyouk_pipeline` is a English model originally trained by ChuGyouk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_chugyouk_pipeline_en_5.5.0_3.0_1726787472501.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_chugyouk_pipeline_en_5.5.0_3.0_1726787472501.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_minds14_english_chugyouk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_minds14_english_chugyouk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_english_chugyouk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.0 MB| + +## References + +https://huggingface.co/ChuGyouk/whisper-tiny-minds14-en + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_french_chaoli_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_french_chaoli_en.md new file mode 100644 index 00000000000000..6f351c356a3b82 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_french_chaoli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_chaoli XlmRoBertaForTokenClassification from ChaoLi +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_chaoli +date: 2024-09-19 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_chaoli` is a English model originally trained by ChaoLi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_chaoli_en_5.5.0_3.0_1726708981131.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_chaoli_en_5.5.0_3.0_1726708981131.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_chaoli","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_chaoli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_chaoli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/ChaoLi/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_michaelkim_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_michaelkim_pipeline_en.md new file mode 100644 index 00000000000000..cdd52296b0d625 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-19-xlm_roberta_base_finetuned_panx_german_michaelkim_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_michaelkim_pipeline pipeline XlmRoBertaForTokenClassification from MichaelKim +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_michaelkim_pipeline +date: 2024-09-19 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_michaelkim_pipeline` is a English model originally trained by MichaelKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_michaelkim_pipeline_en_5.5.0_3.0_1726754277444.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_michaelkim_pipeline_en_5.5.0_3.0_1726754277444.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_michaelkim_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_michaelkim_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_michaelkim_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|840.8 MB| + +## References + +https://huggingface.co/MichaelKim/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-2020_q4_25p_filtered_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-2020_q4_25p_filtered_pipeline_en.md new file mode 100644 index 00000000000000..40a5311a4fe505 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-2020_q4_25p_filtered_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2020_q4_25p_filtered_pipeline pipeline RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q4_25p_filtered_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q4_25p_filtered_pipeline` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q4_25p_filtered_pipeline_en_5.5.0_3.0_1726796745516.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q4_25p_filtered_pipeline_en_5.5.0_3.0_1726796745516.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2020_q4_25p_filtered_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2020_q4_25p_filtered_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q4_25p_filtered_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q4-25p-filtered + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-a_nepal_bhasa_repo_edurayan_en.md b/docs/_posts/ahmedlone127/2024-09-20-a_nepal_bhasa_repo_edurayan_en.md new file mode 100644 index 00000000000000..cfa404a3612852 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-a_nepal_bhasa_repo_edurayan_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English a_nepal_bhasa_repo_edurayan DistilBertForSequenceClassification from EduRayan +author: John Snow Labs +name: a_nepal_bhasa_repo_edurayan +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`a_nepal_bhasa_repo_edurayan` is a English model originally trained by EduRayan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/a_nepal_bhasa_repo_edurayan_en_5.5.0_3.0_1726871734597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/a_nepal_bhasa_repo_edurayan_en_5.5.0_3.0_1726871734597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("a_nepal_bhasa_repo_edurayan","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("a_nepal_bhasa_repo_edurayan", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|a_nepal_bhasa_repo_edurayan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/EduRayan/A-new-repo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_travel_4_16_5_oos_en.md b/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_travel_4_16_5_oos_en.md new file mode 100644 index 00000000000000..0ddb69fbe649d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-all_roberta_large_v1_travel_4_16_5_oos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English all_roberta_large_v1_travel_4_16_5_oos RoBertaForSequenceClassification from fathyshalab +author: John Snow Labs +name: all_roberta_large_v1_travel_4_16_5_oos +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`all_roberta_large_v1_travel_4_16_5_oos` is a English model originally trained by fathyshalab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_travel_4_16_5_oos_en_5.5.0_3.0_1726804833301.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/all_roberta_large_v1_travel_4_16_5_oos_en_5.5.0_3.0_1726804833301.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_travel_4_16_5_oos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("all_roberta_large_v1_travel_4_16_5_oos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|all_roberta_large_v1_travel_4_16_5_oos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/fathyshalab/all-roberta-large-v1-travel-4-16-5-oos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-azbertacontextualizedwordembeddingsinazerbaijanilanguage_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-azbertacontextualizedwordembeddingsinazerbaijanilanguage_pipeline_en.md new file mode 100644 index 00000000000000..ec530243fcb428 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-azbertacontextualizedwordembeddingsinazerbaijanilanguage_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English azbertacontextualizedwordembeddingsinazerbaijanilanguage_pipeline pipeline RoBertaEmbeddings from turalizada +author: John Snow Labs +name: azbertacontextualizedwordembeddingsinazerbaijanilanguage_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`azbertacontextualizedwordembeddingsinazerbaijanilanguage_pipeline` is a English model originally trained by turalizada. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/azbertacontextualizedwordembeddingsinazerbaijanilanguage_pipeline_en_5.5.0_3.0_1726857785539.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/azbertacontextualizedwordembeddingsinazerbaijanilanguage_pipeline_en_5.5.0_3.0_1726857785539.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("azbertacontextualizedwordembeddingsinazerbaijanilanguage_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("azbertacontextualizedwordembeddingsinazerbaijanilanguage_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|azbertacontextualizedwordembeddingsinazerbaijanilanguage_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/turalizada/AzBERTaContextualizedWordEmbeddingsinAzerbaijaniLanguage + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-b001_cleaned_en.md b/docs/_posts/ahmedlone127/2024-09-20-b001_cleaned_en.md new file mode 100644 index 00000000000000..af8a774d8f110e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-b001_cleaned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English b001_cleaned DistilBertForSequenceClassification from Theoreticallyhugo +author: John Snow Labs +name: b001_cleaned +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`b001_cleaned` is a English model originally trained by Theoreticallyhugo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/b001_cleaned_en_5.5.0_3.0_1726871524721.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/b001_cleaned_en_5.5.0_3.0_1726871524721.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("b001_cleaned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("b001_cleaned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|b001_cleaned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Theoreticallyhugo/B001_cleaned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_250_redo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_250_redo_pipeline_en.md new file mode 100644 index 00000000000000..87cffc4a6b9908 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_250_redo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_250_redo_pipeline pipeline DistilBertForSequenceClassification from intrinsic-disorder +author: John Snow Labs +name: bert_250_redo_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_250_redo_pipeline` is a English model originally trained by intrinsic-disorder. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_250_redo_pipeline_en_5.5.0_3.0_1726861211299.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_250_redo_pipeline_en_5.5.0_3.0_1726861211299.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_250_redo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_250_redo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_250_redo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/intrinsic-disorder/bert-250-redo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_250k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_250k_pipeline_en.md new file mode 100644 index 00000000000000..fb2bfd8319080d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_250k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_250k_pipeline pipeline DistilBertForSequenceClassification from intrinsic-disorder +author: John Snow Labs +name: bert_250k_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_250k_pipeline` is a English model originally trained by intrinsic-disorder. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_250k_pipeline_en_5.5.0_3.0_1726842476788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_250k_pipeline_en_5.5.0_3.0_1726842476788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_250k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_250k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_250k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/intrinsic-disorder/bert-250k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_klue_mrc_finetuned_jihoonkimharu_ko.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_klue_mrc_finetuned_jihoonkimharu_ko.md new file mode 100644 index 00000000000000..cf607f197b8618 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_klue_mrc_finetuned_jihoonkimharu_ko.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Korean bert_base_klue_mrc_finetuned_jihoonkimharu BertForQuestionAnswering from jihoonkimharu +author: John Snow Labs +name: bert_base_klue_mrc_finetuned_jihoonkimharu +date: 2024-09-20 +tags: [ko, open_source, onnx, question_answering, bert] +task: Question Answering +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_klue_mrc_finetuned_jihoonkimharu` is a Korean model originally trained by jihoonkimharu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_klue_mrc_finetuned_jihoonkimharu_ko_5.5.0_3.0_1726820644319.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_klue_mrc_finetuned_jihoonkimharu_ko_5.5.0_3.0_1726820644319.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_klue_mrc_finetuned_jihoonkimharu","ko") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_klue_mrc_finetuned_jihoonkimharu", "ko") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_klue_mrc_finetuned_jihoonkimharu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|ko| +|Size:|412.4 MB| + +## References + +https://huggingface.co/jihoonkimharu/bert-base-klue-mrc-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline_en.md new file mode 100644 index 00000000000000..27906bf6a53755 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline_en_5.5.0_3.0_1726834048789.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline_en_5.5.0_3.0_1726834048789.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_25_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.25-lr-4e-07-wd-1e-05-dp-0.3-ss-0-st-False-fh-False-hs-600 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_finetuned_ner_mbalos_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_finetuned_ner_mbalos_en.md new file mode 100644 index 00000000000000..9ec80c810db6e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_finetuned_ner_mbalos_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_ner_mbalos BertForTokenClassification from mbalos +author: John Snow Labs +name: bert_finetuned_ner_mbalos +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_mbalos` is a English model originally trained by mbalos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_mbalos_en_5.5.0_3.0_1726840237378.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_mbalos_en_5.5.0_3.0_1726840237378.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_mbalos","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_mbalos", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_mbalos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/mbalos/bert-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_jigsaw_severetoxic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_jigsaw_severetoxic_pipeline_en.md new file mode 100644 index 00000000000000..4f56f09813da40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_jigsaw_severetoxic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_jigsaw_severetoxic_pipeline pipeline BertForSequenceClassification from Cameron +author: John Snow Labs +name: bert_jigsaw_severetoxic_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_jigsaw_severetoxic_pipeline` is a English model originally trained by Cameron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_jigsaw_severetoxic_pipeline_en_5.5.0_3.0_1726859956011.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_jigsaw_severetoxic_pipeline_en_5.5.0_3.0_1726859956011.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_jigsaw_severetoxic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_jigsaw_severetoxic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_jigsaw_severetoxic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Cameron/BERT-jigsaw-severetoxic + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-bert_tiny_squadv2_en.md b/docs/_posts/ahmedlone127/2024-09-20-bert_tiny_squadv2_en.md new file mode 100644 index 00000000000000..b031290b2e19a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-bert_tiny_squadv2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_tiny_squadv2 BertForQuestionAnswering from VenkatManda +author: John Snow Labs +name: bert_tiny_squadv2 +date: 2024-09-20 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tiny_squadv2` is a English model originally trained by VenkatManda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tiny_squadv2_en_5.5.0_3.0_1726820437390.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tiny_squadv2_en_5.5.0_3.0_1726820437390.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_tiny_squadv2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_tiny_squadv2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tiny_squadv2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|16.7 MB| + +## References + +https://huggingface.co/VenkatManda/bert-tiny-squadV2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_eli5_mlm_model_aldaalmira_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_eli5_mlm_model_aldaalmira_pipeline_en.md new file mode 100644 index 00000000000000..f007111ae4b229 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_eli5_mlm_model_aldaalmira_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_aldaalmira_pipeline pipeline RoBertaEmbeddings from aldaalmira +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_aldaalmira_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_aldaalmira_pipeline` is a English model originally trained by aldaalmira. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_aldaalmira_pipeline_en_5.5.0_3.0_1726857417478.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_aldaalmira_pipeline_en_5.5.0_3.0_1726857417478.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_aldaalmira_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_aldaalmira_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_aldaalmira_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/aldaalmira/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_feelwoo_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_feelwoo_pipeline_en.md new file mode 100644 index 00000000000000..9b19fa35998423 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_feelwoo_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_feelwoo_pipeline pipeline DistilBertForSequenceClassification from feelwoo +author: John Snow Labs +name: burmese_awesome_model_feelwoo_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_feelwoo_pipeline` is a English model originally trained by feelwoo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_feelwoo_pipeline_en_5.5.0_3.0_1726842278980.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_feelwoo_pipeline_en_5.5.0_3.0_1726842278980.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_feelwoo_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_feelwoo_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_feelwoo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/feelwoo/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_priority_3_en.md b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_priority_3_en.md new file mode 100644 index 00000000000000..8f3dfe47d356b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-burmese_awesome_model_priority_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_priority_3 DistilBertForSequenceClassification from lenate +author: John Snow Labs +name: burmese_awesome_model_priority_3 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_priority_3` is a English model originally trained by lenate. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_priority_3_en_5.5.0_3.0_1726842284296.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_priority_3_en_5.5.0_3.0_1726842284296.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_priority_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_priority_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_priority_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lenate/my_awesome_model_priority_3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-cat_ner_spanish_5_en.md b/docs/_posts/ahmedlone127/2024-09-20-cat_ner_spanish_5_en.md new file mode 100644 index 00000000000000..ccf2f06b50b118 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-cat_ner_spanish_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cat_ner_spanish_5 RoBertaForTokenClassification from homersimpson +author: John Snow Labs +name: cat_ner_spanish_5 +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cat_ner_spanish_5` is a English model originally trained by homersimpson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cat_ner_spanish_5_en_5.5.0_3.0_1726847568889.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cat_ner_spanish_5_en_5.5.0_3.0_1726847568889.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("cat_ner_spanish_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("cat_ner_spanish_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cat_ner_spanish_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|462.3 MB| + +## References + +https://huggingface.co/homersimpson/cat-ner-es-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-classif_mmate_1_5_original_cont_3_sent_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-classif_mmate_1_5_original_cont_3_sent_pipeline_en.md new file mode 100644 index 00000000000000..66d9435a66f1c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-classif_mmate_1_5_original_cont_3_sent_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English classif_mmate_1_5_original_cont_3_sent_pipeline pipeline BertForSequenceClassification from spneshaei +author: John Snow Labs +name: classif_mmate_1_5_original_cont_3_sent_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classif_mmate_1_5_original_cont_3_sent_pipeline` is a English model originally trained by spneshaei. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classif_mmate_1_5_original_cont_3_sent_pipeline_en_5.5.0_3.0_1726860348492.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classif_mmate_1_5_original_cont_3_sent_pipeline_en_5.5.0_3.0_1726860348492.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classif_mmate_1_5_original_cont_3_sent_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classif_mmate_1_5_original_cont_3_sent_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classif_mmate_1_5_original_cont_3_sent_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.1 MB| + +## References + +https://huggingface.co/spneshaei/classif_mmate_1_5_original_cont_3_sent + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-descr_class_two_cm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-descr_class_two_cm_pipeline_en.md new file mode 100644 index 00000000000000..817d554780a329 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-descr_class_two_cm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English descr_class_two_cm_pipeline pipeline DistilBertForSequenceClassification from BanananaMax +author: John Snow Labs +name: descr_class_two_cm_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`descr_class_two_cm_pipeline` is a English model originally trained by BanananaMax. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/descr_class_two_cm_pipeline_en_5.5.0_3.0_1726849050270.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/descr_class_two_cm_pipeline_en_5.5.0_3.0_1726849050270.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("descr_class_two_cm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("descr_class_two_cm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|descr_class_two_cm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/BanananaMax/descr_class_two_cm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-dialogue_overfit_check_fold_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-dialogue_overfit_check_fold_4_pipeline_en.md new file mode 100644 index 00000000000000..27f22e8ca27b72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-dialogue_overfit_check_fold_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dialogue_overfit_check_fold_4_pipeline pipeline DistilBertForSequenceClassification from SharonTudi +author: John Snow Labs +name: dialogue_overfit_check_fold_4_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dialogue_overfit_check_fold_4_pipeline` is a English model originally trained by SharonTudi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dialogue_overfit_check_fold_4_pipeline_en_5.5.0_3.0_1726848824190.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dialogue_overfit_check_fold_4_pipeline_en_5.5.0_3.0_1726848824190.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dialogue_overfit_check_fold_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dialogue_overfit_check_fold_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dialogue_overfit_check_fold_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/SharonTudi/DIALOGUE_overfit_check_fold_4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_emotion_ft_0520_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_emotion_ft_0520_pipeline_en.md new file mode 100644 index 00000000000000..474d7f2837b81b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_emotion_ft_0520_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_emotion_ft_0520_pipeline pipeline DistilBertForSequenceClassification from TangXiaoMing123 +author: John Snow Labs +name: distilbert_base_uncased_emotion_ft_0520_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_emotion_ft_0520_pipeline` is a English model originally trained by TangXiaoMing123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_ft_0520_pipeline_en_5.5.0_3.0_1726823710214.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_emotion_ft_0520_pipeline_en_5.5.0_3.0_1726823710214.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_emotion_ft_0520_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_emotion_ft_0520_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_emotion_ft_0520_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/TangXiaoMing123/distilbert-base-uncased_emotion_ft_0520 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_raota_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_raota_en.md new file mode 100644 index 00000000000000..3f832cdf294944 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_emotion_raota_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_raota DistilBertForSequenceClassification from raota +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_raota +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_raota` is a English model originally trained by raota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_raota_en_5.5.0_3.0_1726792219323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_raota_en_5.5.0_3.0_1726792219323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_raota","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_raota", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_raota| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/raota/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_global_intent_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_global_intent_pipeline_en.md new file mode 100644 index 00000000000000..32ba0738bf430f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_finetuned_global_intent_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_global_intent_pipeline pipeline DistilBertForSequenceClassification from alibidaran +author: John Snow Labs +name: distilbert_base_uncased_finetuned_global_intent_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_global_intent_pipeline` is a English model originally trained by alibidaran. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_global_intent_pipeline_en_5.5.0_3.0_1726792490204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_global_intent_pipeline_en_5.5.0_3.0_1726792490204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_global_intent_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_global_intent_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_global_intent_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/alibidaran/distilbert-base-uncased-finetuned-Global_Intent + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100_en.md new file mode 100644 index 00000000000000..dec7292f7ba5a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100_en_5.5.0_3.0_1726830021037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100_en_5.5.0_3.0_1726830021037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st17sd_ut72ut3_plprefix0stlarge_simsp100_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st17sd_ut72ut3_PLPrefix0stlarge_simsp100_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_pipeline_en.md new file mode 100644 index 00000000000000..c8ef4779424ee6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_pipeline_en_5.5.0_3.0_1726832863519.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_pipeline_en_5.5.0_3.0_1726832863519.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st19sd_ut72ut1_plprefix0stlarge19_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st19sd_ut72ut1_PLPrefix0stlarge19_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_pipeline_en.md new file mode 100644 index 00000000000000..986f960458f999 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_pipeline_en_5.5.0_3.0_1726848942016.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_pipeline_en_5.5.0_3.0_1726848942016.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large90pfxnf_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1large90PfxNf_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_pipeline_en.md new file mode 100644 index 00000000000000..6aa882640b6788 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_pipeline_en_5.5.0_3.0_1726829873885.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_pipeline_en_5.5.0_3.0_1726829873885.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st5sd_ut72ut5_plprefix0stlarge5_simsp100_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st5sd_ut72ut5_PLPrefix0stlarge5_simsp100_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_tuvalu_zephyr_1shot_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_tuvalu_zephyr_1shot_en.md new file mode 100644 index 00000000000000..7e546894297c6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_base_uncased_tuvalu_zephyr_1shot_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_tuvalu_zephyr_1shot DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_tuvalu_zephyr_1shot +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_tuvalu_zephyr_1shot` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_tuvalu_zephyr_1shot_en_5.5.0_3.0_1726848809553.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_tuvalu_zephyr_1shot_en_5.5.0_3.0_1726848809553.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_tuvalu_zephyr_1shot","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_tuvalu_zephyr_1shot", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_tuvalu_zephyr_1shot| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_tvl_zephyr_1shot \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_cola_384_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_cola_384_pipeline_en.md new file mode 100644 index 00000000000000..1f9f43b2f78f43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sanskrit_saskta_glue_experiment_cola_384_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_cola_384_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_cola_384_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_cola_384_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_cola_384_pipeline_en_5.5.0_3.0_1726842090763.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_cola_384_pipeline_en_5.5.0_3.0_1726842090763.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_cola_384_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_cola_384_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_cola_384_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|111.8 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_cola_384 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-distilbert_sql_timeout_classifier_with_features_4096_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sql_timeout_classifier_with_features_4096_pipeline_en.md new file mode 100644 index 00000000000000..e8fffab08f7e88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-distilbert_sql_timeout_classifier_with_features_4096_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sql_timeout_classifier_with_features_4096_pipeline pipeline DistilBertForSequenceClassification from Lifehouse +author: John Snow Labs +name: distilbert_sql_timeout_classifier_with_features_4096_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sql_timeout_classifier_with_features_4096_pipeline` is a English model originally trained by Lifehouse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sql_timeout_classifier_with_features_4096_pipeline_en_5.5.0_3.0_1726823673687.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sql_timeout_classifier_with_features_4096_pipeline_en_5.5.0_3.0_1726823673687.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sql_timeout_classifier_with_features_4096_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sql_timeout_classifier_with_features_4096_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sql_timeout_classifier_with_features_4096_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|259.8 MB| + +## References + +https://huggingface.co/Lifehouse/distilbert-sql-timeout-classifier-with-features-4096 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-final_reviews_finetuned_model_epoch_05_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-final_reviews_finetuned_model_epoch_05_pipeline_en.md new file mode 100644 index 00000000000000..52617793095fde --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-final_reviews_finetuned_model_epoch_05_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English final_reviews_finetuned_model_epoch_05_pipeline pipeline DistilBertForSequenceClassification from rahulgaikwad007 +author: John Snow Labs +name: final_reviews_finetuned_model_epoch_05_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_reviews_finetuned_model_epoch_05_pipeline` is a English model originally trained by rahulgaikwad007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_reviews_finetuned_model_epoch_05_pipeline_en_5.5.0_3.0_1726849057443.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_reviews_finetuned_model_epoch_05_pipeline_en_5.5.0_3.0_1726849057443.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("final_reviews_finetuned_model_epoch_05_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("final_reviews_finetuned_model_epoch_05_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_reviews_finetuned_model_epoch_05_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rahulgaikwad007/Final-reviews-Finetuned-model-Epoch-05 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_amazon_sample100000_text_robertamodel_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_amazon_sample100000_text_robertamodel_pipeline_en.md new file mode 100644 index 00000000000000..6226be74ef4b6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_amazon_sample100000_text_robertamodel_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_amazon_sample100000_text_robertamodel_pipeline pipeline RoBertaForSequenceClassification from hsiuping +author: John Snow Labs +name: finetuning_amazon_sample100000_text_robertamodel_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_amazon_sample100000_text_robertamodel_pipeline` is a English model originally trained by hsiuping. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_amazon_sample100000_text_robertamodel_pipeline_en_5.5.0_3.0_1726851787746.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_amazon_sample100000_text_robertamodel_pipeline_en_5.5.0_3.0_1726851787746.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_amazon_sample100000_text_robertamodel_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_amazon_sample100000_text_robertamodel_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_amazon_sample100000_text_robertamodel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/hsiuping/finetuning-amazon-sample100000-text-RoBERTamodel + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_bijupv_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_bijupv_pipeline_en.md new file mode 100644 index 00000000000000..ed9ed2a0f6699a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_bijupv_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_bijupv_pipeline pipeline DistilBertForSequenceClassification from BijuPV +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_bijupv_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_bijupv_pipeline` is a English model originally trained by BijuPV. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_bijupv_pipeline_en_5.5.0_3.0_1726792315100.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_bijupv_pipeline_en_5.5.0_3.0_1726792315100.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_bijupv_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_bijupv_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_bijupv_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BijuPV/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_manikanta0002_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_manikanta0002_pipeline_en.md new file mode 100644 index 00000000000000..9f5a8049ceac9c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3000_samples_manikanta0002_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_manikanta0002_pipeline pipeline DistilBertForSequenceClassification from manikanta0002 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_manikanta0002_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_manikanta0002_pipeline` is a English model originally trained by manikanta0002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_manikanta0002_pipeline_en_5.5.0_3.0_1726871729813.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_manikanta0002_pipeline_en_5.5.0_3.0_1726871729813.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_manikanta0002_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_manikanta0002_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_manikanta0002_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/manikanta0002/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3500_samples_train_yvillamil_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3500_samples_train_yvillamil_pipeline_en.md new file mode 100644 index 00000000000000..9d98864019925e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_3500_samples_train_yvillamil_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3500_samples_train_yvillamil_pipeline pipeline DistilBertForSequenceClassification from yvillamil +author: John Snow Labs +name: finetuning_sentiment_model_3500_samples_train_yvillamil_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3500_samples_train_yvillamil_pipeline` is a English model originally trained by yvillamil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3500_samples_train_yvillamil_pipeline_en_5.5.0_3.0_1726823485063.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3500_samples_train_yvillamil_pipeline_en_5.5.0_3.0_1726823485063.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3500_samples_train_yvillamil_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3500_samples_train_yvillamil_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3500_samples_train_yvillamil_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yvillamil/finetuning-sentiment-model-3500-samples-train + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_5000_amazonbaby_samples_a01793005_en.md b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_5000_amazonbaby_samples_a01793005_en.md new file mode 100644 index 00000000000000..74866791bd6f31 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-finetuning_sentiment_model_5000_amazonbaby_samples_a01793005_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_5000_amazonbaby_samples_a01793005 DistilBertForSequenceClassification from A01793005 +author: John Snow Labs +name: finetuning_sentiment_model_5000_amazonbaby_samples_a01793005 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_5000_amazonbaby_samples_a01793005` is a English model originally trained by A01793005. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazonbaby_samples_a01793005_en_5.5.0_3.0_1726848933026.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazonbaby_samples_a01793005_en_5.5.0_3.0_1726848933026.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_5000_amazonbaby_samples_a01793005","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_5000_amazonbaby_samples_a01793005", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_5000_amazonbaby_samples_a01793005| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/A01793005/finetuning-sentiment-model-5000-amazonbaby-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-llm_b_hw1_en.md b/docs/_posts/ahmedlone127/2024-09-20-llm_b_hw1_en.md new file mode 100644 index 00000000000000..22ccb9858a8e6c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-llm_b_hw1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English llm_b_hw1 DistilBertForSequenceClassification from VincentYH +author: John Snow Labs +name: llm_b_hw1 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llm_b_hw1` is a English model originally trained by VincentYH. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llm_b_hw1_en_5.5.0_3.0_1726841448495.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llm_b_hw1_en_5.5.0_3.0_1726841448495.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("llm_b_hw1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("llm_b_hw1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llm_b_hw1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/VincentYH/LLM_B_HW1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ltp_roberta_large_defaultltp_roberta_large_default_char_ins_en.md b/docs/_posts/ahmedlone127/2024-09-20-ltp_roberta_large_defaultltp_roberta_large_default_char_ins_en.md new file mode 100644 index 00000000000000..72e24aa317680d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ltp_roberta_large_defaultltp_roberta_large_default_char_ins_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ltp_roberta_large_defaultltp_roberta_large_default_char_ins RoBertaForSequenceClassification from sara-nabhani +author: John Snow Labs +name: ltp_roberta_large_defaultltp_roberta_large_default_char_ins +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ltp_roberta_large_defaultltp_roberta_large_default_char_ins` is a English model originally trained by sara-nabhani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ltp_roberta_large_defaultltp_roberta_large_default_char_ins_en_5.5.0_3.0_1726804467034.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ltp_roberta_large_defaultltp_roberta_large_default_char_ins_en_5.5.0_3.0_1726804467034.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("ltp_roberta_large_defaultltp_roberta_large_default_char_ins","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("ltp_roberta_large_defaultltp_roberta_large_default_char_ins", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ltp_roberta_large_defaultltp_roberta_large_default_char_ins| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/sara-nabhani/ltp-roberta-large-defaultltp-roberta-large-default-char_ins \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-nlp_project_ajchang6_en.md b/docs/_posts/ahmedlone127/2024-09-20-nlp_project_ajchang6_en.md new file mode 100644 index 00000000000000..d8e884005178c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-nlp_project_ajchang6_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nlp_project_ajchang6 DistilBertForSequenceClassification from ajchang6 +author: John Snow Labs +name: nlp_project_ajchang6 +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_project_ajchang6` is a English model originally trained by ajchang6. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_project_ajchang6_en_5.5.0_3.0_1726832782547.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_project_ajchang6_en_5.5.0_3.0_1726832782547.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp_project_ajchang6","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp_project_ajchang6", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_project_ajchang6| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ajchang6/nlp_project \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-prueba_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-prueba_pipeline_en.md new file mode 100644 index 00000000000000..b3a78113f70625 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-prueba_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English prueba_pipeline pipeline DistilBertForSequenceClassification from rayosoftware +author: John Snow Labs +name: prueba_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`prueba_pipeline` is a English model originally trained by rayosoftware. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/prueba_pipeline_en_5.5.0_3.0_1726832568833.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/prueba_pipeline_en_5.5.0_3.0_1726832568833.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("prueba_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("prueba_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|prueba_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rayosoftware/prueba + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-ptcrawl_plus_legal_large_v1_7__checkpoint_last_en.md b/docs/_posts/ahmedlone127/2024-09-20-ptcrawl_plus_legal_large_v1_7__checkpoint_last_en.md new file mode 100644 index 00000000000000..ca7f020adb1d43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-ptcrawl_plus_legal_large_v1_7__checkpoint_last_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ptcrawl_plus_legal_large_v1_7__checkpoint_last RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: ptcrawl_plus_legal_large_v1_7__checkpoint_last +date: 2024-09-20 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ptcrawl_plus_legal_large_v1_7__checkpoint_last` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ptcrawl_plus_legal_large_v1_7__checkpoint_last_en_5.5.0_3.0_1726858053742.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ptcrawl_plus_legal_large_v1_7__checkpoint_last_en_5.5.0_3.0_1726858053742.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("ptcrawl_plus_legal_large_v1_7__checkpoint_last","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("ptcrawl_plus_legal_large_v1_7__checkpoint_last","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ptcrawl_plus_legal_large_v1_7__checkpoint_last| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|842.9 MB| + +## References + +https://huggingface.co/eduagarcia-temp/ptcrawl_plus_legal_large_v1_7__checkpoint_last \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-qa_persian_bert_persian_farsi_zwnj_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-qa_persian_bert_persian_farsi_zwnj_base_pipeline_en.md new file mode 100644 index 00000000000000..18c2eef4763fde --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-qa_persian_bert_persian_farsi_zwnj_base_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English qa_persian_bert_persian_farsi_zwnj_base_pipeline pipeline BertForQuestionAnswering from makhataei +author: John Snow Labs +name: qa_persian_bert_persian_farsi_zwnj_base_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`qa_persian_bert_persian_farsi_zwnj_base_pipeline` is a English model originally trained by makhataei. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/qa_persian_bert_persian_farsi_zwnj_base_pipeline_en_5.5.0_3.0_1726820547412.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/qa_persian_bert_persian_farsi_zwnj_base_pipeline_en_5.5.0_3.0_1726820547412.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("qa_persian_bert_persian_farsi_zwnj_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("qa_persian_bert_persian_farsi_zwnj_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|qa_persian_bert_persian_farsi_zwnj_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|441.7 MB| + +## References + +https://huggingface.co/makhataei/qa-persian-bert-fa-zwnj-base + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-question_classification_minervabotteam_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-question_classification_minervabotteam_pipeline_en.md new file mode 100644 index 00000000000000..1bd9e174166876 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-question_classification_minervabotteam_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English question_classification_minervabotteam_pipeline pipeline DistilBertForSequenceClassification from MinervaBotTeam +author: John Snow Labs +name: question_classification_minervabotteam_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`question_classification_minervabotteam_pipeline` is a English model originally trained by MinervaBotTeam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/question_classification_minervabotteam_pipeline_en_5.5.0_3.0_1726849135896.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/question_classification_minervabotteam_pipeline_en_5.5.0_3.0_1726849135896.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("question_classification_minervabotteam_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("question_classification_minervabotteam_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|question_classification_minervabotteam_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MinervaBotTeam/Question_Classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_base_legal_indian_courts_downstream_build_rr_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_legal_indian_courts_downstream_build_rr_pipeline_en.md new file mode 100644 index 00000000000000..d0828a5900afd6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_base_legal_indian_courts_downstream_build_rr_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_legal_indian_courts_downstream_build_rr_pipeline pipeline RoBertaForTokenClassification from MHGanainy +author: John Snow Labs +name: roberta_base_legal_indian_courts_downstream_build_rr_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_legal_indian_courts_downstream_build_rr_pipeline` is a English model originally trained by MHGanainy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_legal_indian_courts_downstream_build_rr_pipeline_en_5.5.0_3.0_1726862707099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_legal_indian_courts_downstream_build_rr_pipeline_en_5.5.0_3.0_1726862707099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_legal_indian_courts_downstream_build_rr_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_legal_indian_courts_downstream_build_rr_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_legal_indian_courts_downstream_build_rr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.2 MB| + +## References + +https://huggingface.co/MHGanainy/roberta-base-legal-indian-courts-downstream-build_rr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_large_polyglotner_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_polyglotner_en.md new file mode 100644 index 00000000000000..c36c700bde608e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_large_polyglotner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_polyglotner RoBertaForTokenClassification from CheccoCando +author: John Snow Labs +name: roberta_large_polyglotner +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_polyglotner` is a English model originally trained by CheccoCando. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_polyglotner_en_5.5.0_3.0_1726862590690.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_polyglotner_en_5.5.0_3.0_1726862590690.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_polyglotner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_polyglotner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_polyglotner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/CheccoCando/roberta-large_PolyglotNER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_link_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_link_en.md new file mode 100644 index 00000000000000..488eef0111bb96 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_link_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_link RoBertaForTokenClassification from chanwoopark +author: John Snow Labs +name: roberta_link +date: 2024-09-20 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_link` is a English model originally trained by chanwoopark. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_link_en_5.5.0_3.0_1726846890436.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_link_en_5.5.0_3.0_1726846890436.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_link","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_link", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_link| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|462.4 MB| + +## References + +https://huggingface.co/chanwoopark/roberta-link \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-roberta_link_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-roberta_link_pipeline_en.md new file mode 100644 index 00000000000000..4840352e3d87f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-roberta_link_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_link_pipeline pipeline RoBertaForTokenClassification from chanwoopark +author: John Snow Labs +name: roberta_link_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_link_pipeline` is a English model originally trained by chanwoopark. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_link_pipeline_en_5.5.0_3.0_1726846913970.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_link_pipeline_en_5.5.0_3.0_1726846913970.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_link_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_link_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_link_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|462.5 MB| + +## References + +https://huggingface.co/chanwoopark/roberta-link + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-sentience_classification_score_pytorch_en.md b/docs/_posts/ahmedlone127/2024-09-20-sentience_classification_score_pytorch_en.md new file mode 100644 index 00000000000000..ab55a1e6a29482 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-sentience_classification_score_pytorch_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentience_classification_score_pytorch DistilBertForSequenceClassification from aeaee +author: John Snow Labs +name: sentience_classification_score_pytorch +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentience_classification_score_pytorch` is a English model originally trained by aeaee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentience_classification_score_pytorch_en_5.5.0_3.0_1726823615020.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentience_classification_score_pytorch_en_5.5.0_3.0_1726823615020.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentience_classification_score_pytorch","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentience_classification_score_pytorch", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentience_classification_score_pytorch| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.4 MB| + +## References + +https://huggingface.co/aeaee/SENTIENCE_Classification_Score_pytorch \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-topic_topic_random1_seed0_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-20-topic_topic_random1_seed0_bernice_en.md new file mode 100644 index 00000000000000..1610008a4bdf15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-topic_topic_random1_seed0_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English topic_topic_random1_seed0_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: topic_topic_random1_seed0_bernice +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`topic_topic_random1_seed0_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/topic_topic_random1_seed0_bernice_en_5.5.0_3.0_1726872646660.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/topic_topic_random1_seed0_bernice_en_5.5.0_3.0_1726872646660.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("topic_topic_random1_seed0_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("topic_topic_random1_seed0_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|topic_topic_random1_seed0_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|805.5 MB| + +## References + +https://huggingface.co/tweettemposhift/topic-topic_random1_seed0-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_56_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_56_pipeline_en.md new file mode 100644 index 00000000000000..4e30a32d1a8344 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-uned_tfg_08_56_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English uned_tfg_08_56_pipeline pipeline RoBertaForSequenceClassification from alexisdr +author: John Snow Labs +name: uned_tfg_08_56_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`uned_tfg_08_56_pipeline` is a English model originally trained by alexisdr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/uned_tfg_08_56_pipeline_en_5.5.0_3.0_1726851518349.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/uned_tfg_08_56_pipeline_en_5.5.0_3.0_1726851518349.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("uned_tfg_08_56_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("uned_tfg_08_56_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|uned_tfg_08_56_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.7 MB| + +## References + +https://huggingface.co/alexisdr/uned-tfg-08.56 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_small200speedysep6_spanish_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_small200speedysep6_spanish_pipeline_es.md new file mode 100644 index 00000000000000..2c6309623e8adc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_small200speedysep6_spanish_pipeline_es.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Castilian, Spanish whisper_small200speedysep6_spanish_pipeline pipeline WhisperForCTC from jessicadiveai +author: John Snow Labs +name: whisper_small200speedysep6_spanish_pipeline +date: 2024-09-20 +tags: [es, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small200speedysep6_spanish_pipeline` is a Castilian, Spanish model originally trained by jessicadiveai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small200speedysep6_spanish_pipeline_es_5.5.0_3.0_1726814396240.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small200speedysep6_spanish_pipeline_es_5.5.0_3.0_1726814396240.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small200speedysep6_spanish_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small200speedysep6_spanish_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small200speedysep6_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|1.7 GB| + +## References + +https://huggingface.co/jessicadiveai/whisper-small200speedysep6-es + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_small_hindi_ver2_hi.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_hindi_ver2_hi.md new file mode 100644 index 00000000000000..e429f7769b487c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_small_hindi_ver2_hi.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Hindi whisper_small_hindi_ver2 WhisperForCTC from saxenagauravhf +author: John Snow Labs +name: whisper_small_hindi_ver2 +date: 2024-09-20 +tags: [hi, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_ver2` is a Hindi model originally trained by saxenagauravhf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_ver2_hi_5.5.0_3.0_1726814107573.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_ver2_hi_5.5.0_3.0_1726814107573.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_ver2","hi") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_ver2", "hi") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_ver2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|hi| +|Size:|1.7 GB| + +## References + +https://huggingface.co/saxenagauravhf/whisper-small-hi-ver2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_finetuned_minds14_jackismyshephard_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_finetuned_minds14_jackismyshephard_pipeline_en.md new file mode 100644 index 00000000000000..735b9aaf027621 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-whisper_tiny_finetuned_minds14_jackismyshephard_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_finetuned_minds14_jackismyshephard_pipeline pipeline WhisperForCTC from JackismyShephard +author: John Snow Labs +name: whisper_tiny_finetuned_minds14_jackismyshephard_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_finetuned_minds14_jackismyshephard_pipeline` is a English model originally trained by JackismyShephard. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetuned_minds14_jackismyshephard_pipeline_en_5.5.0_3.0_1726874333467.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_finetuned_minds14_jackismyshephard_pipeline_en_5.5.0_3.0_1726874333467.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_finetuned_minds14_jackismyshephard_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_finetuned_minds14_jackismyshephard_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_finetuned_minds14_jackismyshephard_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|389.7 MB| + +## References + +https://huggingface.co/JackismyShephard/whisper-tiny-finetuned-minds14 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_claimbuster_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_claimbuster_pipeline_en.md new file mode 100644 index 00000000000000..a2120b086dbddb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_claimbuster_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_claimbuster_pipeline pipeline XlmRoBertaForSequenceClassification from Nithiwat +author: John Snow Labs +name: xlm_roberta_base_claimbuster_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_claimbuster_pipeline` is a English model originally trained by Nithiwat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_claimbuster_pipeline_en_5.5.0_3.0_1726800482552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_claimbuster_pipeline_en_5.5.0_3.0_1726800482552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_claimbuster_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_claimbuster_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_claimbuster_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|845.2 MB| + +## References + +https://huggingface.co/Nithiwat/xlm-roberta-base_claimbuster + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_koroku_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_koroku_pipeline_en.md new file mode 100644 index 00000000000000..13d4f3954dad13 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xlm_roberta_base_finetuned_panx_all_koroku_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_koroku_pipeline pipeline XlmRoBertaForTokenClassification from koroku +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_koroku_pipeline +date: 2024-09-20 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_koroku_pipeline` is a English model originally trained by koroku. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_koroku_pipeline_en_5.5.0_3.0_1726844308912.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_koroku_pipeline_en_5.5.0_3.0_1726844308912.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_koroku_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_all_koroku_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_koroku_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|861.2 MB| + +## References + +https://huggingface.co/koroku/xlm-roberta-base-finetuned-panx-all + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_french_en.md b/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_french_en.md new file mode 100644 index 00000000000000..5357d6aea05870 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-20-xnli_xlm_r_only_french_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xnli_xlm_r_only_french XlmRoBertaForSequenceClassification from semindan +author: John Snow Labs +name: xnli_xlm_r_only_french +date: 2024-09-20 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xnli_xlm_r_only_french` is a English model originally trained by semindan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_french_en_5.5.0_3.0_1726872639105.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_french_en_5.5.0_3.0_1726872639105.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xnli_xlm_r_only_french","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xnli_xlm_r_only_french", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xnli_xlm_r_only_french| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|810.8 MB| + +## References + +https://huggingface.co/semindan/xnli_xlm_r_only_fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-2020_q1_50p_filtered_random_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-2020_q1_50p_filtered_random_pipeline_en.md new file mode 100644 index 00000000000000..0a4a5188206173 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-2020_q1_50p_filtered_random_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2020_q1_50p_filtered_random_pipeline pipeline RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q1_50p_filtered_random_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q1_50p_filtered_random_pipeline` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q1_50p_filtered_random_pipeline_en_5.5.0_3.0_1726957738187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q1_50p_filtered_random_pipeline_en_5.5.0_3.0_1726957738187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2020_q1_50p_filtered_random_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2020_q1_50p_filtered_random_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q1_50p_filtered_random_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q1-50p-filtered-random + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-angry_en.md b/docs/_posts/ahmedlone127/2024-09-21-angry_en.md new file mode 100644 index 00000000000000..b1deb7dc773640 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-angry_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English angry RoBertaEmbeddings from MatthijsN +author: John Snow Labs +name: angry +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`angry` is a English model originally trained by MatthijsN. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/angry_en_5.5.0_3.0_1726942707991.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/angry_en_5.5.0_3.0_1726942707991.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("angry","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("angry","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|angry| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/MatthijsN/angry \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-base_english_combined_v4_4_0_8_1e_05_divine_sweep_17_en.md b/docs/_posts/ahmedlone127/2024-09-21-base_english_combined_v4_4_0_8_1e_05_divine_sweep_17_en.md new file mode 100644 index 00000000000000..41ce14259acc91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-base_english_combined_v4_4_0_8_1e_05_divine_sweep_17_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English base_english_combined_v4_4_0_8_1e_05_divine_sweep_17 WhisperForCTC from saahith +author: John Snow Labs +name: base_english_combined_v4_4_0_8_1e_05_divine_sweep_17 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_english_combined_v4_4_0_8_1e_05_divine_sweep_17` is a English model originally trained by saahith. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_english_combined_v4_4_0_8_1e_05_divine_sweep_17_en_5.5.0_3.0_1726960713043.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_english_combined_v4_4_0_8_1e_05_divine_sweep_17_en_5.5.0_3.0_1726960713043.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("base_english_combined_v4_4_0_8_1e_05_divine_sweep_17","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("base_english_combined_v4_4_0_8_1e_05_divine_sweep_17", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_english_combined_v4_4_0_8_1e_05_divine_sweep_17| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|646.6 MB| + +## References + +https://huggingface.co/saahith/base.en-combined_v4-4-0-8-1e-05-divine-sweep-17 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_dutch_cased_finetuned_ner8_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_dutch_cased_finetuned_ner8_en.md new file mode 100644 index 00000000000000..d9ff2f732f00c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_dutch_cased_finetuned_ner8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_dutch_cased_finetuned_ner8 BertForTokenClassification from Matthijsvanhof +author: John Snow Labs +name: bert_base_dutch_cased_finetuned_ner8 +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_dutch_cased_finetuned_ner8` is a English model originally trained by Matthijsvanhof. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_dutch_cased_finetuned_ner8_en_5.5.0_3.0_1726889649969.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_dutch_cased_finetuned_ner8_en_5.5.0_3.0_1726889649969.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_dutch_cased_finetuned_ner8","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_dutch_cased_finetuned_ner8", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_dutch_cased_finetuned_ner8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/Matthijsvanhof/bert-base-dutch-cased-finetuned-NER8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_dutch_cased_finetuned_ner8_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_dutch_cased_finetuned_ner8_pipeline_en.md new file mode 100644 index 00000000000000..44c8db91f0eee3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_dutch_cased_finetuned_ner8_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_dutch_cased_finetuned_ner8_pipeline pipeline BertForTokenClassification from Matthijsvanhof +author: John Snow Labs +name: bert_base_dutch_cased_finetuned_ner8_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_dutch_cased_finetuned_ner8_pipeline` is a English model originally trained by Matthijsvanhof. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_dutch_cased_finetuned_ner8_pipeline_en_5.5.0_3.0_1726889669063.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_dutch_cased_finetuned_ner8_pipeline_en_5.5.0_3.0_1726889669063.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_dutch_cased_finetuned_ner8_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_dutch_cased_finetuned_ner8_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_dutch_cased_finetuned_ner8_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/Matthijsvanhof/bert-base-dutch-cased-finetuned-NER8 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetuned_quac_1qahistory_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetuned_quac_1qahistory_pipeline_en.md new file mode 100644 index 00000000000000..4cccbffcfb7a01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-bert_base_uncased_finetuned_quac_1qahistory_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_quac_1qahistory_pipeline pipeline BertForQuestionAnswering from Jellevdl +author: John Snow Labs +name: bert_base_uncased_finetuned_quac_1qahistory_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_quac_1qahistory_pipeline` is a English model originally trained by Jellevdl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_quac_1qahistory_pipeline_en_5.5.0_3.0_1726946760086.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_quac_1qahistory_pipeline_en_5.5.0_3.0_1726946760086.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_quac_1qahistory_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_quac_1qahistory_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_quac_1qahistory_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Jellevdl/bert-base-uncased-finetuned-quac-1QAhistory + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_skotha_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_skotha_pipeline_en.md new file mode 100644 index 00000000000000..819513a34f05ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_eli5_mlm_model_skotha_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_skotha_pipeline pipeline RoBertaEmbeddings from skotha +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_skotha_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_skotha_pipeline` is a English model originally trained by skotha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_skotha_pipeline_en_5.5.0_3.0_1726934240200.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_skotha_pipeline_en_5.5.0_3.0_1726934240200.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_skotha_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_skotha_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_skotha_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/skotha/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_ilanpar_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_ilanpar_pipeline_en.md new file mode 100644 index 00000000000000..554440c2c25a73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-burmese_awesome_model_ilanpar_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_ilanpar_pipeline pipeline DistilBertForSequenceClassification from ilanPar +author: John Snow Labs +name: burmese_awesome_model_ilanpar_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_ilanpar_pipeline` is a English model originally trained by ilanPar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ilanpar_pipeline_en_5.5.0_3.0_1726953071581.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ilanpar_pipeline_en_5.5.0_3.0_1726953071581.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_ilanpar_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_ilanpar_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_ilanpar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ilanPar/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_parksuna_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_parksuna_pipeline_en.md new file mode 100644 index 00000000000000..504e919db999f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_finetuned_emotion_parksuna_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_parksuna_pipeline pipeline DistilBertForSequenceClassification from parksuna +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_parksuna_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_parksuna_pipeline` is a English model originally trained by parksuna. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_parksuna_pipeline_en_5.5.0_3.0_1726953589868.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_parksuna_pipeline_en_5.5.0_3.0_1726953589868.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_parksuna_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_parksuna_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_parksuna_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/parksuna/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_pipeline_en.md new file mode 100644 index 00000000000000..d3b11a8be5e1d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_pipeline_en_5.5.0_3.0_1726884797954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_pipeline_en_5.5.0_3.0_1726884797954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1large10pfxnf_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1large10PfxNf_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_pipeline_en.md new file mode 100644 index 00000000000000..151a6f1fbca169 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_pipeline_en_5.5.0_3.0_1726923830310.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_pipeline_en_5.5.0_3.0_1726923830310.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_small_talk_zphr_0st_ut52ut5_ad7_simsp_clean + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distillbert_8_en.md b/docs/_posts/ahmedlone127/2024-09-21-distillbert_8_en.md new file mode 100644 index 00000000000000..315047012b77b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distillbert_8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distillbert_8 DistilBertForSequenceClassification from dzd828 +author: John Snow Labs +name: distillbert_8 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distillbert_8` is a English model originally trained by dzd828. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distillbert_8_en_5.5.0_3.0_1726924115259.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distillbert_8_en_5.5.0_3.0_1726924115259.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distillbert_8","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distillbert_8", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distillbert_8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dzd828/distillbert-8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_finetuned_wikitext2_ethanoutangoun_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_finetuned_wikitext2_ethanoutangoun_pipeline_en.md new file mode 100644 index 00000000000000..be1a099c43fcc8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-distilroberta_base_finetuned_wikitext2_ethanoutangoun_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_finetuned_wikitext2_ethanoutangoun_pipeline pipeline RoBertaEmbeddings from ethanoutangoun +author: John Snow Labs +name: distilroberta_base_finetuned_wikitext2_ethanoutangoun_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_wikitext2_ethanoutangoun_pipeline` is a English model originally trained by ethanoutangoun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_ethanoutangoun_pipeline_en_5.5.0_3.0_1726934785651.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_ethanoutangoun_pipeline_en_5.5.0_3.0_1726934785651.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_finetuned_wikitext2_ethanoutangoun_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_finetuned_wikitext2_ethanoutangoun_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_wikitext2_ethanoutangoun_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/ethanoutangoun/distilroberta-base-finetuned-wikitext2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-final_finetuned_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-final_finetuned_model_pipeline_en.md new file mode 100644 index 00000000000000..b20e01b6174fae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-final_finetuned_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English final_finetuned_model_pipeline pipeline DistilBertForSequenceClassification from rahulgaikwad007 +author: John Snow Labs +name: final_finetuned_model_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_finetuned_model_pipeline` is a English model originally trained by rahulgaikwad007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_finetuned_model_pipeline_en_5.5.0_3.0_1726924234603.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_finetuned_model_pipeline_en_5.5.0_3.0_1726924234603.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("final_finetuned_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("final_finetuned_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_finetuned_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rahulgaikwad007/Final-Finetuned-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline_en.md new file mode 100644 index 00000000000000..cc5e0eb55be495 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline pipeline BertForQuestionAnswering from muhammadravi251001 +author: John Snow Labs +name: fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline` is a English model originally trained by muhammadravi251001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline_en_5.5.0_3.0_1726928954251.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline_en_5.5.0_3.0_1726928954251.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_datasetqas_squad_indonesian_with_indobert_base_uncased_without_ittl_without_freeze_lr_1e_05_muhammadravi251001_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.7 MB| + +## References + +https://huggingface.co/muhammadravi251001/fine-tuned-DatasetQAS-Squad-ID-with-indobert-base-uncased-without-ITTL-without-freeze-LR-1e-05 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuned_distilbert_for_reddit_depression_detection_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuned_distilbert_for_reddit_depression_detection_pipeline_en.md new file mode 100644 index 00000000000000..d3649033381f4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuned_distilbert_for_reddit_depression_detection_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_distilbert_for_reddit_depression_detection_pipeline pipeline DistilBertForSequenceClassification from sunF1ow3r +author: John Snow Labs +name: finetuned_distilbert_for_reddit_depression_detection_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_distilbert_for_reddit_depression_detection_pipeline` is a English model originally trained by sunF1ow3r. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_distilbert_for_reddit_depression_detection_pipeline_en_5.5.0_3.0_1726953725388.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_distilbert_for_reddit_depression_detection_pipeline_en_5.5.0_3.0_1726953725388.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_distilbert_for_reddit_depression_detection_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_distilbert_for_reddit_depression_detection_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_distilbert_for_reddit_depression_detection_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sunF1ow3r/finetuned-distilBERT-for-reddit-depression-detection + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_iossifpalli_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_iossifpalli_en.md new file mode 100644 index 00000000000000..84b82710b361df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_iossifpalli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_iossifpalli DistilBertForSequenceClassification from IossifPalli +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_iossifpalli +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_iossifpalli` is a English model originally trained by IossifPalli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_iossifpalli_en_5.5.0_3.0_1726888596801.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_iossifpalli_en_5.5.0_3.0_1726888596801.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_iossifpalli","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_iossifpalli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_iossifpalli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/IossifPalli/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_ramanen_en.md b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_ramanen_en.md new file mode 100644 index 00000000000000..3b65927e824f89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finetuning_sentiment_model_3000_samples_ramanen_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ramanen DistilBertForSequenceClassification from Ramanen +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ramanen +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ramanen` is a English model originally trained by Ramanen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ramanen_en_5.5.0_3.0_1726889108044.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ramanen_en_5.5.0_3.0_1726889108044.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ramanen","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ramanen", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ramanen| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Ramanen/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-finnews_sentimentanalysis_v1_en.md b/docs/_posts/ahmedlone127/2024-09-21-finnews_sentimentanalysis_v1_en.md new file mode 100644 index 00000000000000..32b98899fd0b7e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-finnews_sentimentanalysis_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finnews_sentimentanalysis_v1 DistilBertForSequenceClassification from JoanParanoid +author: John Snow Labs +name: finnews_sentimentanalysis_v1 +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finnews_sentimentanalysis_v1` is a English model originally trained by JoanParanoid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finnews_sentimentanalysis_v1_en_5.5.0_3.0_1726953427943.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finnews_sentimentanalysis_v1_en_5.5.0_3.0_1726953427943.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finnews_sentimentanalysis_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finnews_sentimentanalysis_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finnews_sentimentanalysis_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/JoanParanoid/FinNews_SentimentAnalysis_v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-ground_english_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-21-ground_english_roberta_base_en.md new file mode 100644 index 00000000000000..5ca2e2128e81c5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-ground_english_roberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ground_english_roberta_base RoBertaEmbeddings from dreamerdeo +author: John Snow Labs +name: ground_english_roberta_base +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ground_english_roberta_base` is a English model originally trained by dreamerdeo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ground_english_roberta_base_en_5.5.0_3.0_1726934664039.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ground_english_roberta_base_en_5.5.0_3.0_1726934664039.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("ground_english_roberta_base","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("ground_english_roberta_base","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ground_english_roberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|464.1 MB| + +## References + +https://huggingface.co/dreamerdeo/ground-en-roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-inria_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-inria_roberta_pipeline_en.md new file mode 100644 index 00000000000000..f9b2549877dddb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-inria_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English inria_roberta_pipeline pipeline RoBertaEmbeddings from subbareddyiiit +author: John Snow Labs +name: inria_roberta_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`inria_roberta_pipeline` is a English model originally trained by subbareddyiiit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/inria_roberta_pipeline_en_5.5.0_3.0_1726942589300.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/inria_roberta_pipeline_en_5.5.0_3.0_1726942589300.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("inria_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("inria_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|inria_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/subbareddyiiit/inria_roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-jerteh355sentneg4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-jerteh355sentneg4_pipeline_en.md new file mode 100644 index 00000000000000..b72a4b21462703 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-jerteh355sentneg4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English jerteh355sentneg4_pipeline pipeline RoBertaForSequenceClassification from Tanor +author: John Snow Labs +name: jerteh355sentneg4_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jerteh355sentneg4_pipeline` is a English model originally trained by Tanor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jerteh355sentneg4_pipeline_en_5.5.0_3.0_1726900943137.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jerteh355sentneg4_pipeline_en_5.5.0_3.0_1726900943137.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("jerteh355sentneg4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("jerteh355sentneg4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jerteh355sentneg4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Tanor/Jerteh355SENTNEG4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-racism_finetuned_detests_wandb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-racism_finetuned_detests_wandb_pipeline_en.md new file mode 100644 index 00000000000000..42298787742899 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-racism_finetuned_detests_wandb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English racism_finetuned_detests_wandb_pipeline pipeline RoBertaForSequenceClassification from Pablo94 +author: John Snow Labs +name: racism_finetuned_detests_wandb_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`racism_finetuned_detests_wandb_pipeline` is a English model originally trained by Pablo94. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/racism_finetuned_detests_wandb_pipeline_en_5.5.0_3.0_1726940926547.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/racism_finetuned_detests_wandb_pipeline_en_5.5.0_3.0_1726940926547.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("racism_finetuned_detests_wandb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("racism_finetuned_detests_wandb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|racism_finetuned_detests_wandb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|449.7 MB| + +## References + +https://huggingface.co/Pablo94/racism-finetuned-detests-wandb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-robbert_cosmetic_v2_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-21-robbert_cosmetic_v2_finetuned_en.md new file mode 100644 index 00000000000000..50c26726e4f5d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-robbert_cosmetic_v2_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robbert_cosmetic_v2_finetuned RoBertaEmbeddings from ymelka +author: John Snow Labs +name: robbert_cosmetic_v2_finetuned +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_cosmetic_v2_finetuned` is a English model originally trained by ymelka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_cosmetic_v2_finetuned_en_5.5.0_3.0_1726934426072.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_cosmetic_v2_finetuned_en_5.5.0_3.0_1726934426072.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("robbert_cosmetic_v2_finetuned","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("robbert_cosmetic_v2_finetuned","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_cosmetic_v2_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|464.8 MB| + +## References + +https://huggingface.co/ymelka/robbert-cosmetic-v2-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberdou_100k_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberdou_100k_en.md new file mode 100644 index 00000000000000..256b0e36581421 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberdou_100k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberdou_100k RoBertaEmbeddings from flavio-nakasato +author: John Snow Labs +name: roberdou_100k +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberdou_100k` is a English model originally trained by flavio-nakasato. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberdou_100k_en_5.5.0_3.0_1726882094775.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberdou_100k_en_5.5.0_3.0_1726882094775.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberdou_100k","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberdou_100k","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberdou_100k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|634.4 MB| + +## References + +https://huggingface.co/flavio-nakasato/roberdou_100k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberdou_100k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberdou_100k_pipeline_en.md new file mode 100644 index 00000000000000..f23be44ffc9b90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberdou_100k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberdou_100k_pipeline pipeline RoBertaEmbeddings from flavio-nakasato +author: John Snow Labs +name: roberdou_100k_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberdou_100k_pipeline` is a English model originally trained by flavio-nakasato. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberdou_100k_pipeline_en_5.5.0_3.0_1726882127665.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberdou_100k_pipeline_en_5.5.0_3.0_1726882127665.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberdou_100k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberdou_100k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberdou_100k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|634.4 MB| + +## References + +https://huggingface.co/flavio-nakasato/roberdou_100k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_base_last_2_chars_acl2023_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_last_2_chars_acl2023_pipeline_en.md new file mode 100644 index 00000000000000..5d4f649b2a2581 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_base_last_2_chars_acl2023_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_last_2_chars_acl2023_pipeline pipeline RoBertaEmbeddings from hitachi-nlp +author: John Snow Labs +name: roberta_base_last_2_chars_acl2023_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_last_2_chars_acl2023_pipeline` is a English model originally trained by hitachi-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_last_2_chars_acl2023_pipeline_en_5.5.0_3.0_1726934716722.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_last_2_chars_acl2023_pipeline_en_5.5.0_3.0_1726934716722.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_last_2_chars_acl2023_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_last_2_chars_acl2023_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_last_2_chars_acl2023_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.3 MB| + +## References + +https://huggingface.co/hitachi-nlp/roberta-base_last-2-chars_acl2023 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_chunking_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_chunking_pipeline_en.md new file mode 100644 index 00000000000000..f592baadcc022c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_finetuned_chunking_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_finetuned_chunking_pipeline pipeline BertForTokenClassification from mariolinml +author: John Snow Labs +name: roberta_large_finetuned_chunking_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_finetuned_chunking_pipeline` is a English model originally trained by mariolinml. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_chunking_pipeline_en_5.5.0_3.0_1726889568755.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_finetuned_chunking_pipeline_en_5.5.0_3.0_1726889568755.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_finetuned_chunking_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_finetuned_chunking_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_finetuned_chunking_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/mariolinml/roberta-large-finetuned-chunking + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_large_tweet_topic_multi_2020_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_tweet_topic_multi_2020_pipeline_en.md new file mode 100644 index 00000000000000..6ef70f3e3a8267 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_large_tweet_topic_multi_2020_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_tweet_topic_multi_2020_pipeline pipeline RoBertaForSequenceClassification from cardiffnlp +author: John Snow Labs +name: roberta_large_tweet_topic_multi_2020_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_tweet_topic_multi_2020_pipeline` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_tweet_topic_multi_2020_pipeline_en_5.5.0_3.0_1726941042196.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_tweet_topic_multi_2020_pipeline_en_5.5.0_3.0_1726941042196.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_tweet_topic_multi_2020_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_tweet_topic_multi_2020_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_tweet_topic_multi_2020_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/cardiffnlp/roberta-large-tweet-topic-multi-2020 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_news_cnn_dailymail_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_news_cnn_dailymail_en.md new file mode 100644 index 00000000000000..f56d652c79d774 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_news_cnn_dailymail_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_news_cnn_dailymail RoBertaEmbeddings from isarth +author: John Snow Labs +name: roberta_news_cnn_dailymail +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_news_cnn_dailymail` is a English model originally trained by isarth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_news_cnn_dailymail_en_5.5.0_3.0_1726943452670.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_news_cnn_dailymail_en_5.5.0_3.0_1726943452670.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_news_cnn_dailymail","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_news_cnn_dailymail","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_news_cnn_dailymail| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/isarth/roberta-news-cnn_dailymail \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_retrained_250k_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_retrained_250k_en.md new file mode 100644 index 00000000000000..f0bceac3c712a6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_retrained_250k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_retrained_250k RoBertaEmbeddings from bitsanlp +author: John Snow Labs +name: roberta_retrained_250k +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_retrained_250k` is a English model originally trained by bitsanlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_retrained_250k_en_5.5.0_3.0_1726934064605.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_retrained_250k_en_5.5.0_3.0_1726934064605.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_retrained_250k","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_retrained_250k","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_retrained_250k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.8 MB| + +## References + +https://huggingface.co/bitsanlp/roberta-retrained-250k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-roberta_retrained_kunalr63_en.md b/docs/_posts/ahmedlone127/2024-09-21-roberta_retrained_kunalr63_en.md new file mode 100644 index 00000000000000..f28be9ffda0e3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-roberta_retrained_kunalr63_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_retrained_kunalr63 RoBertaEmbeddings from kunalr63 +author: John Snow Labs +name: roberta_retrained_kunalr63 +date: 2024-09-21 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_retrained_kunalr63` is a English model originally trained by kunalr63. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_retrained_kunalr63_en_5.5.0_3.0_1726882347322.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_retrained_kunalr63_en_5.5.0_3.0_1726882347322.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_retrained_kunalr63","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_retrained_kunalr63","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_retrained_kunalr63| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/kunalr63/roberta-retrained \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-rubert_tiny2_cedr_russian_emotion_pipeline_ru.md b/docs/_posts/ahmedlone127/2024-09-21-rubert_tiny2_cedr_russian_emotion_pipeline_ru.md new file mode 100644 index 00000000000000..621dcdfb3f75ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-rubert_tiny2_cedr_russian_emotion_pipeline_ru.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Russian rubert_tiny2_cedr_russian_emotion_pipeline pipeline BertForSequenceClassification from seara +author: John Snow Labs +name: rubert_tiny2_cedr_russian_emotion_pipeline +date: 2024-09-21 +tags: [ru, open_source, pipeline, onnx] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_tiny2_cedr_russian_emotion_pipeline` is a Russian model originally trained by seara. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_tiny2_cedr_russian_emotion_pipeline_ru_5.5.0_3.0_1726955077704.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_tiny2_cedr_russian_emotion_pipeline_ru_5.5.0_3.0_1726955077704.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rubert_tiny2_cedr_russian_emotion_pipeline", lang = "ru") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rubert_tiny2_cedr_russian_emotion_pipeline", lang = "ru") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_tiny2_cedr_russian_emotion_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ru| +|Size:|109.5 MB| + +## References + +https://huggingface.co/seara/rubert-tiny2-cedr-russian-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v2_ar.md b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v2_ar.md new file mode 100644 index 00000000000000..a6f791f66f665f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_arabertmo_base_v2_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic sent_arabertmo_base_v2 BertSentenceEmbeddings from Ebtihal +author: John Snow Labs +name: sent_arabertmo_base_v2 +date: 2024-09-21 +tags: [ar, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_arabertmo_base_v2` is a Arabic model originally trained by Ebtihal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v2_ar_5.5.0_3.0_1726941884078.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_arabertmo_base_v2_ar_5.5.0_3.0_1726941884078.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_arabertmo_base_v2","ar") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_arabertmo_base_v2","ar") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_arabertmo_base_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ar| +|Size:|407.9 MB| + +## References + +https://huggingface.co/Ebtihal/AraBertMo_base_V2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_italian_cased_osiria_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_italian_cased_osiria_pipeline_it.md new file mode 100644 index 00000000000000..8b28866da63fd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_bert_base_italian_cased_osiria_pipeline_it.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Italian sent_bert_base_italian_cased_osiria_pipeline pipeline BertSentenceEmbeddings from osiria +author: John Snow Labs +name: sent_bert_base_italian_cased_osiria_pipeline +date: 2024-09-21 +tags: [it, open_source, pipeline, onnx] +task: Embeddings +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_italian_cased_osiria_pipeline` is a Italian model originally trained by osiria. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_italian_cased_osiria_pipeline_it_5.5.0_3.0_1726898316037.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_italian_cased_osiria_pipeline_it_5.5.0_3.0_1726898316037.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_italian_cased_osiria_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_italian_cased_osiria_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_italian_cased_osiria_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|409.5 MB| + +## References + +https://huggingface.co/osiria/bert-base-italian-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-sent_defsent_bert_base_uncased_mean_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-sent_defsent_bert_base_uncased_mean_pipeline_en.md new file mode 100644 index 00000000000000..00c35f6fdbe4ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-sent_defsent_bert_base_uncased_mean_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_defsent_bert_base_uncased_mean_pipeline pipeline BertSentenceEmbeddings from cl-nagoya +author: John Snow Labs +name: sent_defsent_bert_base_uncased_mean_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_defsent_bert_base_uncased_mean_pipeline` is a English model originally trained by cl-nagoya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_defsent_bert_base_uncased_mean_pipeline_en_5.5.0_3.0_1726941325000.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_defsent_bert_base_uncased_mean_pipeline_en_5.5.0_3.0_1726941325000.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_defsent_bert_base_uncased_mean_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_defsent_bert_base_uncased_mean_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_defsent_bert_base_uncased_mean_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/cl-nagoya/defsent-bert-base-uncased-mean + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-spanish_euph_classifier_final_en.md b/docs/_posts/ahmedlone127/2024-09-21-spanish_euph_classifier_final_en.md new file mode 100644 index 00000000000000..daa4dba09aebff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-spanish_euph_classifier_final_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English spanish_euph_classifier_final DistilBertForSequenceClassification from nhankins +author: John Snow Labs +name: spanish_euph_classifier_final +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spanish_euph_classifier_final` is a English model originally trained by nhankins. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spanish_euph_classifier_final_en_5.5.0_3.0_1726953595347.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spanish_euph_classifier_final_en_5.5.0_3.0_1726953595347.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("spanish_euph_classifier_final","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("spanish_euph_classifier_final", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spanish_euph_classifier_final| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/nhankins/es_euph_classifier_final \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-t_6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-t_6_pipeline_en.md new file mode 100644 index 00000000000000..e64e7102d2ab1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-t_6_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English t_6_pipeline pipeline RoBertaForSequenceClassification from Pablojmed +author: John Snow Labs +name: t_6_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t_6_pipeline` is a English model originally trained by Pablojmed. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t_6_pipeline_en_5.5.0_3.0_1726940696169.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t_6_pipeline_en_5.5.0_3.0_1726940696169.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("t_6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("t_6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t_6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.3 MB| + +## References + +https://huggingface.co/Pablojmed/t_6 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-toxic_comment_model_toxicity_ft_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-toxic_comment_model_toxicity_ft_pipeline_en.md new file mode 100644 index 00000000000000..21fceb423bf0a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-toxic_comment_model_toxicity_ft_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English toxic_comment_model_toxicity_ft_pipeline pipeline DistilBertForSequenceClassification from fatmhd1995 +author: John Snow Labs +name: toxic_comment_model_toxicity_ft_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`toxic_comment_model_toxicity_ft_pipeline` is a English model originally trained by fatmhd1995. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/toxic_comment_model_toxicity_ft_pipeline_en_5.5.0_3.0_1726953469297.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/toxic_comment_model_toxicity_ft_pipeline_en_5.5.0_3.0_1726953469297.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("toxic_comment_model_toxicity_ft_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("toxic_comment_model_toxicity_ft_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|toxic_comment_model_toxicity_ft_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/fatmhd1995/toxic-comment-model-TOXICITY-FT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_base_common_voice_16_portuguese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_common_voice_16_portuguese_pipeline_en.md new file mode 100644 index 00000000000000..4d4eddc42b3255 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_base_common_voice_16_portuguese_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_common_voice_16_portuguese_pipeline pipeline WhisperForCTC from thiagobarbosa +author: John Snow Labs +name: whisper_base_common_voice_16_portuguese_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_common_voice_16_portuguese_pipeline` is a English model originally trained by thiagobarbosa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_common_voice_16_portuguese_pipeline_en_5.5.0_3.0_1726911300347.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_common_voice_16_portuguese_pipeline_en_5.5.0_3.0_1726911300347.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_common_voice_16_portuguese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_common_voice_16_portuguese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_common_voice_16_portuguese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|641.9 MB| + +## References + +https://huggingface.co/thiagobarbosa/whisper-base-common-voice-16-pt + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_hungarian_small_augmented_pipeline_hu.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_hungarian_small_augmented_pipeline_hu.md new file mode 100644 index 00000000000000..fb698739cf0af1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_hungarian_small_augmented_pipeline_hu.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Hungarian whisper_hungarian_small_augmented_pipeline pipeline WhisperForCTC from ALM +author: John Snow Labs +name: whisper_hungarian_small_augmented_pipeline +date: 2024-09-21 +tags: [hu, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: hu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_hungarian_small_augmented_pipeline` is a Hungarian model originally trained by ALM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_hungarian_small_augmented_pipeline_hu_5.5.0_3.0_1726891298005.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_hungarian_small_augmented_pipeline_hu_5.5.0_3.0_1726891298005.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_hungarian_small_augmented_pipeline", lang = "hu") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_hungarian_small_augmented_pipeline", lang = "hu") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_hungarian_small_augmented_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hu| +|Size:|1.7 GB| + +## References + +https://huggingface.co/ALM/whisper-hu-small-augmented + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_egy_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_egy_en.md new file mode 100644 index 00000000000000..dfae756ac8278f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_egy_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_egy WhisperForCTC from HuggingPanda +author: John Snow Labs +name: whisper_small_egy +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_egy` is a English model originally trained by HuggingPanda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_egy_en_5.5.0_3.0_1726939650502.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_egy_en_5.5.0_3.0_1726939650502.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_egy","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_egy", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_egy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/HuggingPanda/whisper-small-egy \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_jamin20_en.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_jamin20_en.md new file mode 100644 index 00000000000000..3a0078c85827b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_small_hindi_jamin20_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_hindi_jamin20 WhisperForCTC from Jamin20 +author: John Snow Labs +name: whisper_small_hindi_jamin20 +date: 2024-09-21 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_jamin20` is a English model originally trained by Jamin20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_jamin20_en_5.5.0_3.0_1726911717269.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_jamin20_en_5.5.0_3.0_1726911717269.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_hindi_jamin20","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_hindi_jamin20", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_jamin20| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Jamin20/whisper-small-hi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_kannada_kn.md b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_kannada_kn.md new file mode 100644 index 00000000000000..fe672965cf9f49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-whisper_tiny_kannada_kn.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Kannada whisper_tiny_kannada WhisperForCTC from parambharat +author: John Snow Labs +name: whisper_tiny_kannada +date: 2024-09-21 +tags: [kn, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: kn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_kannada` is a Kannada model originally trained by parambharat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_kannada_kn_5.5.0_3.0_1726891918307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_kannada_kn_5.5.0_3.0_1726891918307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_kannada","kn") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_kannada", "kn") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_kannada| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|kn| +|Size:|391.1 MB| + +## References + +https://huggingface.co/parambharat/whisper-tiny-kn \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-withinapps_ndd_mantisbt_test_content_cwadj_en.md b/docs/_posts/ahmedlone127/2024-09-21-withinapps_ndd_mantisbt_test_content_cwadj_en.md new file mode 100644 index 00000000000000..11ac93cfc9e79d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-withinapps_ndd_mantisbt_test_content_cwadj_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English withinapps_ndd_mantisbt_test_content_cwadj DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_mantisbt_test_content_cwadj +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_mantisbt_test_content_cwadj` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mantisbt_test_content_cwadj_en_5.5.0_3.0_1726889061605.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mantisbt_test_content_cwadj_en_5.5.0_3.0_1726889061605.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_mantisbt_test_content_cwadj","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("withinapps_ndd_mantisbt_test_content_cwadj", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_mantisbt_test_content_cwadj| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-mantisbt_test-content-CWAdj \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-withinapps_ndd_mantisbt_test_content_cwadj_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-withinapps_ndd_mantisbt_test_content_cwadj_pipeline_en.md new file mode 100644 index 00000000000000..95e2cd62ef8888 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-withinapps_ndd_mantisbt_test_content_cwadj_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English withinapps_ndd_mantisbt_test_content_cwadj_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_mantisbt_test_content_cwadj_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_mantisbt_test_content_cwadj_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mantisbt_test_content_cwadj_pipeline_en_5.5.0_3.0_1726889074296.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mantisbt_test_content_cwadj_pipeline_en_5.5.0_3.0_1726889074296.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("withinapps_ndd_mantisbt_test_content_cwadj_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("withinapps_ndd_mantisbt_test_content_cwadj_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_mantisbt_test_content_cwadj_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-mantisbt_test-content-CWAdj + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_mixed_aug_insert_w2v_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_mixed_aug_insert_w2v_en.md new file mode 100644 index 00000000000000..933ce9a2d1fc4c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_balance_mixed_aug_insert_w2v_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_balance_mixed_aug_insert_w2v XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_balance_mixed_aug_insert_w2v +date: 2024-09-21 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_balance_mixed_aug_insert_w2v` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_mixed_aug_insert_w2v_en_5.5.0_3.0_1726932368304.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_balance_mixed_aug_insert_w2v_en_5.5.0_3.0_1726932368304.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_balance_mixed_aug_insert_w2v","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_balance_mixed_aug_insert_w2v", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_balance_mixed_aug_insert_w2v| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|795.1 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Balance_Mixed-aug_insert_w2v \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_all_sungkwangjoong_en.md b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_all_sungkwangjoong_en.md new file mode 100644 index 00000000000000..28a19b0a01b796 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xlm_roberta_base_finetuned_panx_all_sungkwangjoong_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_sungkwangjoong XlmRoBertaForTokenClassification from sungkwangjoong +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_sungkwangjoong +date: 2024-09-21 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_sungkwangjoong` is a English model originally trained by sungkwangjoong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_sungkwangjoong_en_5.5.0_3.0_1726896607351.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_sungkwangjoong_en_5.5.0_3.0_1726896607351.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_sungkwangjoong","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_sungkwangjoong", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_sungkwangjoong| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|850.1 MB| + +## References + +https://huggingface.co/sungkwangjoong/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-21-xnli_xlm_r_only_russian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-21-xnli_xlm_r_only_russian_pipeline_en.md new file mode 100644 index 00000000000000..b8c55ea8e96c5e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-21-xnli_xlm_r_only_russian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xnli_xlm_r_only_russian_pipeline pipeline XlmRoBertaForSequenceClassification from semindan +author: John Snow Labs +name: xnli_xlm_r_only_russian_pipeline +date: 2024-09-21 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xnli_xlm_r_only_russian_pipeline` is a English model originally trained by semindan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_russian_pipeline_en_5.5.0_3.0_1726933699607.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xnli_xlm_r_only_russian_pipeline_en_5.5.0_3.0_1726933699607.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xnli_xlm_r_only_russian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xnli_xlm_r_only_russian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xnli_xlm_r_only_russian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|812.7 MB| + +## References + +https://huggingface.co/semindan/xnli_xlm_r_only_ru + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-2504separado3_en.md b/docs/_posts/ahmedlone127/2024-09-22-2504separado3_en.md new file mode 100644 index 00000000000000..48b05d59a04e84 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-2504separado3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2504separado3 RoBertaForSequenceClassification from adriansanz +author: John Snow Labs +name: 2504separado3 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2504separado3` is a English model originally trained by adriansanz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2504separado3_en_5.5.0_3.0_1726972448562.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2504separado3_en_5.5.0_3.0_1726972448562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("2504separado3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("2504separado3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2504separado3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|448.4 MB| + +## References + +https://huggingface.co/adriansanz/2504separado3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-adrv2_en.md b/docs/_posts/ahmedlone127/2024-09-22-adrv2_en.md new file mode 100644 index 00000000000000..24ebcea23df9bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-adrv2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English adrv2 RoBertaForSequenceClassification from bqr5tf +author: John Snow Labs +name: adrv2 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`adrv2` is a English model originally trained by bqr5tf. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/adrv2_en_5.5.0_3.0_1726971790656.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/adrv2_en_5.5.0_3.0_1726971790656.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("adrv2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("adrv2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|adrv2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/bqr5tf/ADRv2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ai_text_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-ai_text_model_pipeline_en.md new file mode 100644 index 00000000000000..e22144e8b4750f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ai_text_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ai_text_model_pipeline pipeline BertForSequenceClassification from KaranNag +author: John Snow Labs +name: ai_text_model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ai_text_model_pipeline` is a English model originally trained by KaranNag. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ai_text_model_pipeline_en_5.5.0_3.0_1727032354740.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ai_text_model_pipeline_en_5.5.0_3.0_1727032354740.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ai_text_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ai_text_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ai_text_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/KaranNag/Ai_text_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-anakilang_kelas_ai_en.md b/docs/_posts/ahmedlone127/2024-09-22-anakilang_kelas_ai_en.md new file mode 100644 index 00000000000000..5aac0cd671dd1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-anakilang_kelas_ai_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English anakilang_kelas_ai RoBertaForSequenceClassification from GilarYa +author: John Snow Labs +name: anakilang_kelas_ai +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`anakilang_kelas_ai` is a English model originally trained by GilarYa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/anakilang_kelas_ai_en_5.5.0_3.0_1727017587303.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/anakilang_kelas_ai_en_5.5.0_3.0_1727017587303.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("anakilang_kelas_ai","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("anakilang_kelas_ai", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|anakilang_kelas_ai| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|432.6 MB| + +## References + +https://huggingface.co/GilarYa/anakilang-kelas-ai \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-anakilang_kelas_ai_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-anakilang_kelas_ai_pipeline_en.md new file mode 100644 index 00000000000000..dbaedacd8bca88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-anakilang_kelas_ai_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English anakilang_kelas_ai_pipeline pipeline RoBertaForSequenceClassification from GilarYa +author: John Snow Labs +name: anakilang_kelas_ai_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`anakilang_kelas_ai_pipeline` is a English model originally trained by GilarYa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/anakilang_kelas_ai_pipeline_en_5.5.0_3.0_1727017611434.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/anakilang_kelas_ai_pipeline_en_5.5.0_3.0_1727017611434.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("anakilang_kelas_ai_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("anakilang_kelas_ai_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|anakilang_kelas_ai_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|432.6 MB| + +## References + +https://huggingface.co/GilarYa/anakilang-kelas-ai + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-authdetect_test_en.md b/docs/_posts/ahmedlone127/2024-09-22-authdetect_test_en.md new file mode 100644 index 00000000000000..2a31badf994be0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-authdetect_test_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English authdetect_test RoBertaForSequenceClassification from mmochtak +author: John Snow Labs +name: authdetect_test +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`authdetect_test` is a English model originally trained by mmochtak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/authdetect_test_en_5.5.0_3.0_1726967649157.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/authdetect_test_en_5.5.0_3.0_1726967649157.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("authdetect_test","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("authdetect_test", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|authdetect_test| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|459.6 MB| + +## References + +https://huggingface.co/mmochtak/authdetect_test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_runaways_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_runaways_pipeline_en.md new file mode 100644 index 00000000000000..b59ac425089ef6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_cased_finetuned_runaways_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_cased_finetuned_runaways_pipeline pipeline BertForQuestionAnswering from Nadav +author: John Snow Labs +name: bert_base_cased_finetuned_runaways_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_runaways_pipeline` is a English model originally trained by Nadav. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_runaways_pipeline_en_5.5.0_3.0_1726991687595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_runaways_pipeline_en_5.5.0_3.0_1726991687595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_finetuned_runaways_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_finetuned_runaways_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_runaways_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Nadav/bert-base-cased-finetuned-runaways + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_local_results_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_local_results_en.md new file mode 100644 index 00000000000000..42aff594170361 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_local_results_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_local_results BertForSequenceClassification from serpapi +author: John Snow Labs +name: bert_base_local_results +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_local_results` is a English model originally trained by serpapi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_local_results_en_5.5.0_3.0_1726976498295.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_local_results_en_5.5.0_3.0_1726976498295.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_local_results","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_local_results", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_local_results| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/serpapi/bert-base-local-results \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_uncased_sentiment_navenprasad_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_uncased_sentiment_navenprasad_pipeline_xx.md new file mode 100644 index 00000000000000..0c0da7d706ad51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_multilingual_uncased_sentiment_navenprasad_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_sentiment_navenprasad_pipeline pipeline BertForSequenceClassification from navenprasad +author: John Snow Labs +name: bert_base_multilingual_uncased_sentiment_navenprasad_pipeline +date: 2024-09-22 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_sentiment_navenprasad_pipeline` is a Multilingual model originally trained by navenprasad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_navenprasad_pipeline_xx_5.5.0_3.0_1727034487420.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_navenprasad_pipeline_xx_5.5.0_3.0_1727034487420.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_uncased_sentiment_navenprasad_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_uncased_sentiment_navenprasad_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_sentiment_navenprasad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|627.8 MB| + +## References + +https://huggingface.co/navenprasad/bert-base-multilingual-uncased-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_320240905172321_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_320240905172321_en.md new file mode 100644 index 00000000000000..d1eb062654ad16 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_320240905172321_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_320240905172321 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_320240905172321 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_320240905172321` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_320240905172321_en_5.5.0_3.0_1726991973557.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_320240905172321_en_5.5.0_3.0_1726991973557.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_320240905172321","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_320240905172321", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_320240905172321| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.320240905172321 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_pipeline_en.md new file mode 100644 index 00000000000000..fd977c7362adfc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_pipeline_en_5.5.0_3.0_1727039546460.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_pipeline_en_5.5.0_3.0_1727039546460.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240915122349_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240915122349 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_25_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_25_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline_en.md new file mode 100644 index 00000000000000..d9563946c02d15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_ep_2_25_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_2_25_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_2_25_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_2_25_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_25_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline_en_5.5.0_3.0_1727042295759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_25_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline_en_5.5.0_3.0_1727042295759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_2_25_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_2_25_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_2_25_b_32_lr_1_2e_06_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_600_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-2.25-b-32-lr-1.2e-06-dp-0.3-ss-0-st-False-fh-False-hs-600 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_pipeline_en.md new file mode 100644 index 00000000000000..3540bbd99dd887 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_pipeline_en_5.5.0_3.0_1726992461663.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_pipeline_en_5.5.0_3.0_1726992461663.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_05_wd_0_001_dp_0_99999_swati_300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-05-wd-0.001-dp-0.99999-ss-300 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_pipeline_en.md new file mode 100644 index 00000000000000..b7b0838c55361d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_pipeline_en_5.5.0_3.0_1727042463983.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_pipeline_en_5.5.0_3.0_1727042463983.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_1_0_lr_1e_06_wd_0_001_dp_0_2_swati_500_southern_sotho_true_fh_true_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-1.0-lr-1e-06-wd-0.001-dp-0.2-ss-500-st-True-fh-True + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_cola_ardaaras99_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_cola_ardaaras99_en.md new file mode 100644 index 00000000000000..5d2b88d503657e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_cola_ardaaras99_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_cola_ardaaras99 BertForSequenceClassification from ardaaras99 +author: John Snow Labs +name: bert_base_uncased_finetuned_cola_ardaaras99 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_cola_ardaaras99` is a English model originally trained by ardaaras99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cola_ardaaras99_en_5.5.0_3.0_1726990987635.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cola_ardaaras99_en_5.5.0_3.0_1726990987635.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_cola_ardaaras99","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_cola_ardaaras99", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_cola_ardaaras99| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ardaaras99/bert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_squad2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_squad2_pipeline_en.md new file mode 100644 index 00000000000000..8d44395f24b6cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_finetuned_squad2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_squad2_pipeline pipeline BertForQuestionAnswering from thewiz +author: John Snow Labs +name: bert_base_uncased_finetuned_squad2_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_squad2_pipeline` is a English model originally trained by thewiz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_squad2_pipeline_en_5.5.0_3.0_1727042325859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_squad2_pipeline_en_5.5.0_3.0_1727042325859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_squad2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_squad2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_squad2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/thewiz/bert-base-uncased-finetuned-squad2 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_squad_v1_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_squad_v1_en.md new file mode 100644 index 00000000000000..ec007aba1d3fef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_base_uncased_squad_v1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_squad_v1 BertForQuestionAnswering from helenai +author: John Snow Labs +name: bert_base_uncased_squad_v1 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_squad_v1` is a English model originally trained by helenai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_squad_v1_en_5.5.0_3.0_1726978429264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_squad_v1_en_5.5.0_3.0_1726978429264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_squad_v1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_squad_v1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_squad_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/helenai/bert-base-uncased-squad-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_legalentity_ner_accelerate_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_legalentity_ner_accelerate_en.md new file mode 100644 index 00000000000000..2fefb5d499b6c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_legalentity_ner_accelerate_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_legalentity_ner_accelerate BertForTokenClassification from aimlnerd +author: John Snow Labs +name: bert_finetuned_legalentity_ner_accelerate +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_legalentity_ner_accelerate` is a English model originally trained by aimlnerd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_legalentity_ner_accelerate_en_5.5.0_3.0_1727045873689.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_legalentity_ner_accelerate_en_5.5.0_3.0_1727045873689.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_legalentity_ner_accelerate","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_legalentity_ner_accelerate", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_legalentity_ner_accelerate| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/aimlnerd/bert-finetuned-legalentity-ner-accelerate \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_legalentity_ner_accelerate_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_legalentity_ner_accelerate_pipeline_en.md new file mode 100644 index 00000000000000..f1099c1d3a5eee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_legalentity_ner_accelerate_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_legalentity_ner_accelerate_pipeline pipeline BertForTokenClassification from aimlnerd +author: John Snow Labs +name: bert_finetuned_legalentity_ner_accelerate_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_legalentity_ner_accelerate_pipeline` is a English model originally trained by aimlnerd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_legalentity_ner_accelerate_pipeline_en_5.5.0_3.0_1727045893745.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_legalentity_ner_accelerate_pipeline_en_5.5.0_3.0_1727045893745.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_legalentity_ner_accelerate_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_legalentity_ner_accelerate_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_legalentity_ner_accelerate_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/aimlnerd/bert-finetuned-legalentity-ner-accelerate + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_ner_koakande_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_ner_koakande_pipeline_en.md new file mode 100644 index 00000000000000..d3f1e40e49e12b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_ner_koakande_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_ner_koakande_pipeline pipeline BertForTokenClassification from koakande +author: John Snow Labs +name: bert_finetuned_ner_koakande_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_koakande_pipeline` is a English model originally trained by koakande. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_koakande_pipeline_en_5.5.0_3.0_1727045459415.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_koakande_pipeline_en_5.5.0_3.0_1727045459415.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_ner_koakande_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_ner_koakande_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_koakande_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/koakande/bert-finetuned-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_sql_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_sql_en.md new file mode 100644 index 00000000000000..56d6e2ad31ab35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_finetuned_sql_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_finetuned_sql BertForQuestionAnswering from AlexYang33 +author: John Snow Labs +name: bert_finetuned_sql +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_sql` is a English model originally trained by AlexYang33. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_sql_en_5.5.0_3.0_1726978413801.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_sql_en_5.5.0_3.0_1726978413801.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_sql","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_sql", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_sql| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/AlexYang33/bert-finetuned-sql \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-bert_racial_bias_model_80_0k_samples_fold_2_en.md b/docs/_posts/ahmedlone127/2024-09-22-bert_racial_bias_model_80_0k_samples_fold_2_en.md new file mode 100644 index 00000000000000..728314f278fe7b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-bert_racial_bias_model_80_0k_samples_fold_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_racial_bias_model_80_0k_samples_fold_2 DistilBertForSequenceClassification from jamnik99 +author: John Snow Labs +name: bert_racial_bias_model_80_0k_samples_fold_2 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_racial_bias_model_80_0k_samples_fold_2` is a English model originally trained by jamnik99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_racial_bias_model_80_0k_samples_fold_2_en_5.5.0_3.0_1727020980716.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_racial_bias_model_80_0k_samples_fold_2_en_5.5.0_3.0_1727020980716.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_racial_bias_model_80_0k_samples_fold_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_racial_bias_model_80_0k_samples_fold_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_racial_bias_model_80_0k_samples_fold_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jamnik99/BERT-racial_bias_model_80.0K_samples_fold_2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ian_ailex_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ian_ailex_pipeline_en.md new file mode 100644 index 00000000000000..5786f86531bc52 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ian_ailex_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_ian_ailex_pipeline pipeline DistilBertForSequenceClassification from Ian-AILex +author: John Snow Labs +name: burmese_awesome_model_ian_ailex_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_ian_ailex_pipeline` is a English model originally trained by Ian-AILex. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ian_ailex_pipeline_en_5.5.0_3.0_1727021023167.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ian_ailex_pipeline_en_5.5.0_3.0_1727021023167.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_ian_ailex_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_ian_ailex_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_ian_ailex_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Ian-AILex/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_jimmy77777_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_jimmy77777_en.md new file mode 100644 index 00000000000000..76dbec4d3038f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_jimmy77777_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_jimmy77777 DistilBertForSequenceClassification from Jimmy77777 +author: John Snow Labs +name: burmese_awesome_model_jimmy77777 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_jimmy77777` is a English model originally trained by Jimmy77777. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_jimmy77777_en_5.5.0_3.0_1727033372711.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_jimmy77777_en_5.5.0_3.0_1727033372711.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_jimmy77777","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_jimmy77777", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_jimmy77777| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Jimmy77777/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ollamh_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ollamh_en.md new file mode 100644 index 00000000000000..25398b465d40cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_awesome_model_ollamh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_ollamh DistilBertForSequenceClassification from ollamh +author: John Snow Labs +name: burmese_awesome_model_ollamh +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_ollamh` is a English model originally trained by ollamh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ollamh_en_5.5.0_3.0_1727012676620.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_ollamh_en_5.5.0_3.0_1727012676620.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_ollamh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_ollamh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_ollamh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ollamh/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_bert_question_answering_model5_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_bert_question_answering_model5_en.md new file mode 100644 index 00000000000000..43985cb9bf4404 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_bert_question_answering_model5_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_bert_question_answering_model5 BertForQuestionAnswering from Ashkh0099 +author: John Snow Labs +name: burmese_bert_question_answering_model5 +date: 2024-09-22 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_bert_question_answering_model5` is a English model originally trained by Ashkh0099. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_bert_question_answering_model5_en_5.5.0_3.0_1727039398511.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_bert_question_answering_model5_en_5.5.0_3.0_1727039398511.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("burmese_bert_question_answering_model5","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("burmese_bert_question_answering_model5", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_bert_question_answering_model5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Ashkh0099/my-bert-question-answering-model5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_fine_tuned_distilbert_lr_1e_05_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_fine_tuned_distilbert_lr_1e_05_en.md new file mode 100644 index 00000000000000..2eff3bfc08a093 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_fine_tuned_distilbert_lr_1e_05_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_fine_tuned_distilbert_lr_1e_05 DistilBertForSequenceClassification from Benuehlinger +author: John Snow Labs +name: burmese_fine_tuned_distilbert_lr_1e_05 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_fine_tuned_distilbert_lr_1e_05` is a English model originally trained by Benuehlinger. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_fine_tuned_distilbert_lr_1e_05_en_5.5.0_3.0_1727020895549.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_fine_tuned_distilbert_lr_1e_05_en_5.5.0_3.0_1727020895549.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_fine_tuned_distilbert_lr_1e_05","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_fine_tuned_distilbert_lr_1e_05", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_fine_tuned_distilbert_lr_1e_05| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Benuehlinger/my-fine-tuned-distilbert-lr-1e-05 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-burmese_nepal_bhasa_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-burmese_nepal_bhasa_model_pipeline_en.md new file mode 100644 index 00000000000000..8a1b6c434e6372 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-burmese_nepal_bhasa_model_pipeline_en.md @@ -0,0 +1,72 @@ +--- +layout: model +title: English burmese_nepal_bhasa_model_pipeline pipeline RoBertaForQuestionAnswering from steffipriyanka +author: John Snow Labs +name: burmese_nepal_bhasa_model_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_nepal_bhasa_model_pipeline` is a English model originally trained by steffipriyanka. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_nepal_bhasa_model_pipeline_en_5.5.0_3.0_1727012681091.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_nepal_bhasa_model_pipeline_en_5.5.0_3.0_1727012681091.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +pipeline = PretrainedPipeline("burmese_nepal_bhasa_model_pipeline", lang = "en") +annotations = pipeline.transform(df) +``` +```scala +val pipeline = new PretrainedPipeline("burmese_nepal_bhasa_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_nepal_bhasa_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +References + +https://huggingface.co/steffipriyanka/my_new_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-deberta_embeddings_tapt_nbme_v3_base_en.md b/docs/_posts/ahmedlone127/2024-09-22-deberta_embeddings_tapt_nbme_v3_base_en.md new file mode 100644 index 00000000000000..45534b57337d86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-deberta_embeddings_tapt_nbme_v3_base_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English Deberta Embeddings model (from ZZ99) +author: John Snow Labs +name: deberta_embeddings_tapt_nbme_v3_base +date: 2024-09-22 +tags: [deberta, open_source, deberta_embeddings, debertav2formaskedlm, en, onnx, openvino] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: openvino +annotator: DeBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DebertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `tapt_nbme_deberta_v3_base` is a English model originally trained by `ZZ99`. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_embeddings_tapt_nbme_v3_base_en_5.5.0_3.0_1727046746008.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_embeddings_tapt_nbme_v3_base_en_5.5.0_3.0_1727046746008.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + +{:.model-param} + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DeBertaEmbeddings.pretrained("deberta_embeddings_tapt_nbme_v3_base","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") \ + .setCaseSensitive(True) + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, embeddings]) + +data = spark.createDataFrame([["I love Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val embeddings = DeBertaEmbeddings.pretrained("deberta_embeddings_tapt_nbme_v3_base","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + .setCaseSensitive(true) + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) + +val data = Seq("I love Spark NLP").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_embeddings_tapt_nbme_v3_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|720.7 MB| + +## References + +https://huggingface.co/ZZ99/tapt_nbme_deberta_v3_base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-defsent_roberta_base_mean_en.md b/docs/_posts/ahmedlone127/2024-09-22-defsent_roberta_base_mean_en.md new file mode 100644 index 00000000000000..90a09ba42692bc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-defsent_roberta_base_mean_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English defsent_roberta_base_mean RoBertaEmbeddings from cl-nagoya +author: John Snow Labs +name: defsent_roberta_base_mean +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`defsent_roberta_base_mean` is a English model originally trained by cl-nagoya. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/defsent_roberta_base_mean_en_5.5.0_3.0_1727041747971.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/defsent_roberta_base_mean_en_5.5.0_3.0_1727041747971.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("defsent_roberta_base_mean","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("defsent_roberta_base_mean","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|defsent_roberta_base_mean| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|413.1 MB| + +## References + +https://huggingface.co/cl-nagoya/defsent-roberta-base-mean \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-demo_mangowly_en.md b/docs/_posts/ahmedlone127/2024-09-22-demo_mangowly_en.md new file mode 100644 index 00000000000000..df7cea8027a88e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-demo_mangowly_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English demo_mangowly DistilBertForSequenceClassification from mangowly +author: John Snow Labs +name: demo_mangowly +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`demo_mangowly` is a English model originally trained by mangowly. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/demo_mangowly_en_5.5.0_3.0_1727033601485.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/demo_mangowly_en_5.5.0_3.0_1727033601485.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("demo_mangowly","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("demo_mangowly", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|demo_mangowly| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|250.6 MB| + +## References + +https://huggingface.co/mangowly/demo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-demo_mangowly_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-demo_mangowly_pipeline_en.md new file mode 100644 index 00000000000000..1471ad3990bcfd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-demo_mangowly_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English demo_mangowly_pipeline pipeline DistilBertForSequenceClassification from mangowly +author: John Snow Labs +name: demo_mangowly_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`demo_mangowly_pipeline` is a English model originally trained by mangowly. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/demo_mangowly_pipeline_en_5.5.0_3.0_1727033619846.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/demo_mangowly_pipeline_en_5.5.0_3.0_1727033619846.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("demo_mangowly_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("demo_mangowly_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|demo_mangowly_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|250.6 MB| + +## References + +https://huggingface.co/mangowly/demo + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_cased_sst2_ft_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_cased_sst2_ft_pipeline_en.md new file mode 100644 index 00000000000000..2e70b3bf04eb49 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_cased_sst2_ft_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_cased_sst2_ft_pipeline pipeline DistilBertForSequenceClassification from EgehanEralp +author: John Snow Labs +name: distilbert_base_cased_sst2_ft_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_sst2_ft_pipeline` is a English model originally trained by EgehanEralp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_sst2_ft_pipeline_en_5.5.0_3.0_1726980645033.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_sst2_ft_pipeline_en_5.5.0_3.0_1726980645033.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_cased_sst2_ft_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_cased_sst2_ft_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_sst2_ft_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/EgehanEralp/distilbert-base-cased-sst2-ft + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_finetuned_imdb_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_finetuned_imdb_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..2f53e452947857 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_finetuned_imdb_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_finetuned_imdb_sentiment_pipeline pipeline DistilBertForSequenceClassification from lyrisha +author: John Snow Labs +name: distilbert_base_finetuned_imdb_sentiment_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_finetuned_imdb_sentiment_pipeline` is a English model originally trained by lyrisha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_finetuned_imdb_sentiment_pipeline_en_5.5.0_3.0_1727033368157.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_finetuned_imdb_sentiment_pipeline_en_5.5.0_3.0_1727033368157.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_finetuned_imdb_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_finetuned_imdb_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_finetuned_imdb_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lyrisha/distilbert-base-finetuned-imdb-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_3epoch5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_3epoch5_pipeline_en.md new file mode 100644 index 00000000000000..0237b2100c3897 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_3epoch5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_3epoch5_pipeline pipeline DistilBertForSequenceClassification from dianamihalache27 +author: John Snow Labs +name: distilbert_base_uncased_3epoch5_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_3epoch5_pipeline` is a English model originally trained by dianamihalache27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_3epoch5_pipeline_en_5.5.0_3.0_1727035596787.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_3epoch5_pipeline_en_5.5.0_3.0_1727035596787.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_3epoch5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_3epoch5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_3epoch5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dianamihalache27/distilbert-base-uncased_3epoch5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_pipeline_en.md new file mode 100644 index 00000000000000..926d8d8e9825df --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_pipeline_en_5.5.0_3.0_1727033826204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_pipeline_en_5.5.0_3.0_1727033826204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_banking_zphr_0st72_ut72ut1_plain_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_banking_zphr_0st72_ut72ut1_plain_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_distilled_clinc_dro14_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_distilled_clinc_dro14_pipeline_en.md new file mode 100644 index 00000000000000..709f8497cf29e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_distilled_clinc_dro14_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_clinc_dro14_pipeline pipeline DistilBertForSequenceClassification from dro14 +author: John Snow Labs +name: distilbert_base_uncased_distilled_clinc_dro14_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_clinc_dro14_pipeline` is a English model originally trained by dro14. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_dro14_pipeline_en_5.5.0_3.0_1726980434719.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_clinc_dro14_pipeline_en_5.5.0_3.0_1726980434719.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_distilled_clinc_dro14_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_distilled_clinc_dro14_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_clinc_dro14_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/dro14/distilbert-base-uncased-distilled-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_bensonhugging_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_bensonhugging_pipeline_en.md new file mode 100644 index 00000000000000..2125118e638801 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_cola_bensonhugging_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_bensonhugging_pipeline pipeline DistilBertForSequenceClassification from BensonHugging +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_bensonhugging_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_bensonhugging_pipeline` is a English model originally trained by BensonHugging. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_bensonhugging_pipeline_en_5.5.0_3.0_1727033274639.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_bensonhugging_pipeline_en_5.5.0_3.0_1727033274639.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_bensonhugging_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_bensonhugging_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_bensonhugging_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BensonHugging/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_evernight017_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_evernight017_pipeline_en.md new file mode 100644 index 00000000000000..5bc32c715c35d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_evernight017_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_evernight017_pipeline pipeline DistilBertForSequenceClassification from evernight017 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_evernight017_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_evernight017_pipeline` is a English model originally trained by evernight017. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_evernight017_pipeline_en_5.5.0_3.0_1727035071619.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_evernight017_pipeline_en_5.5.0_3.0_1727035071619.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_evernight017_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_evernight017_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_evernight017_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/evernight017/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jason_oh_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jason_oh_en.md new file mode 100644 index 00000000000000..b7b5f9bf80c7e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_jason_oh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jason_oh DistilBertForSequenceClassification from Jason-Oh +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jason_oh +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jason_oh` is a English model originally trained by Jason-Oh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jason_oh_en_5.5.0_3.0_1727020892721.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jason_oh_en_5.5.0_3.0_1727020892721.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jason_oh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jason_oh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jason_oh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Jason-Oh/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_overall_4_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_overall_4_0_pipeline_en.md new file mode 100644 index 00000000000000..5051a44bddca0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_overall_4_0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_overall_4_0_pipeline pipeline DistilBertForSequenceClassification from LeBruse +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_overall_4_0_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_overall_4_0_pipeline` is a English model originally trained by LeBruse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_overall_4_0_pipeline_en_5.5.0_3.0_1727020590064.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_overall_4_0_pipeline_en_5.5.0_3.0_1727020590064.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_overall_4_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_overall_4_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_overall_4_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LeBruse/distilbert-base-uncased-finetuned-emotion-overall-4.0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_souling_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_souling_pipeline_en.md new file mode 100644 index 00000000000000..849c52f24f5d56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_souling_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_souling_pipeline pipeline DistilBertForSequenceClassification from souling +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_souling_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_souling_pipeline` is a English model originally trained by souling. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_souling_pipeline_en_5.5.0_3.0_1726980728475.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_souling_pipeline_en_5.5.0_3.0_1726980728475.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_souling_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_souling_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_souling_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/souling/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_pipeline_en.md new file mode 100644 index 00000000000000..ec283c84d03f14 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_pipeline pipeline DistilBertForSequenceClassification from yamaguchi-kota +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_pipeline` is a English model originally trained by yamaguchi-kota. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_pipeline_en_5.5.0_3.0_1727012355775.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_pipeline_en_5.5.0_3.0_1727012355775.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_yamaguchi_kota_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yamaguchi-kota/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yashcfc_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yashcfc_en.md new file mode 100644 index 00000000000000..e3e0b63da1ae24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotion_yashcfc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_yashcfc DistilBertForSequenceClassification from yashcfc +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_yashcfc +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_yashcfc` is a English model originally trained by yashcfc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yashcfc_en_5.5.0_3.0_1727033132643.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_yashcfc_en_5.5.0_3.0_1727033132643.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_yashcfc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_yashcfc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_yashcfc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/yashcfc/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotions_fibleep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotions_fibleep_pipeline_en.md new file mode 100644 index 00000000000000..7dda42573b11dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_finetuned_emotions_fibleep_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotions_fibleep_pipeline pipeline DistilBertForSequenceClassification from fibleep +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotions_fibleep_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotions_fibleep_pipeline` is a English model originally trained by fibleep. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_fibleep_pipeline_en_5.5.0_3.0_1727035267304.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotions_fibleep_pipeline_en_5.5.0_3.0_1727035267304.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotions_fibleep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotions_fibleep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotions_fibleep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/fibleep/distilbert-base-uncased-finetuned-emotions + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en.md new file mode 100644 index 00000000000000..4e431e07531d89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en_5.5.0_3.0_1727033467897.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en_5.5.0_3.0_1727033467897.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st13sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st13sd_ut72ut5_PLPrefix0stlarge_simsp100_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md new file mode 100644 index 00000000000000..356988e65e1e70 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en_5.5.0_3.0_1726980337774.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en_5.5.0_3.0_1726980337774.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st5sd_ut72ut1largepfxnf_simsp300_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st5sd_ut72ut1largePfxNf_simsp300_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_pipeline_en.md new file mode 100644 index 00000000000000..5264ea709df924 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_pipeline_en_5.5.0_3.0_1727021076209.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_pipeline_en_5.5.0_3.0_1727021076209.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut72ut1_plain_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut72ut1_plain_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_batch_size_64_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_batch_size_64_pipeline_en.md new file mode 100644 index 00000000000000..7d6cd8859aaa06 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_batch_size_64_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_batch_size_64_pipeline pipeline DistilBertForSequenceClassification from K-kiron +author: John Snow Labs +name: distilbert_batch_size_64_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_batch_size_64_pipeline` is a English model originally trained by K-kiron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_batch_size_64_pipeline_en_5.5.0_3.0_1727012247031.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_batch_size_64_pipeline_en_5.5.0_3.0_1727012247031.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_batch_size_64_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_batch_size_64_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_batch_size_64_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/K-kiron/distilbert-batch-size-64 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_finetuned_imdb_sentiment_omalve_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_finetuned_imdb_sentiment_omalve_pipeline_en.md new file mode 100644 index 00000000000000..7ee51c1c97f87f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_finetuned_imdb_sentiment_omalve_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_finetuned_imdb_sentiment_omalve_pipeline pipeline DistilBertForSequenceClassification from OmAlve +author: John Snow Labs +name: distilbert_finetuned_imdb_sentiment_omalve_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_imdb_sentiment_omalve_pipeline` is a English model originally trained by OmAlve. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_sentiment_omalve_pipeline_en_5.5.0_3.0_1727035193447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_sentiment_omalve_pipeline_en_5.5.0_3.0_1727035193447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_imdb_sentiment_omalve_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_imdb_sentiment_omalve_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_imdb_sentiment_omalve_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/OmAlve/distilbert-finetuned-imdb-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_en.md new file mode 100644 index 00000000000000..491c9ca4624886 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_en_5.5.0_3.0_1727033792143.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192_en_5.5.0_3.0_1727033792143.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_data_aug_mnli_192| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|52.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_data_aug_mnli_192 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_pipeline_en.md new file mode 100644 index 00000000000000..b1ed0dc06c7dad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_pipeline_en_5.5.0_3.0_1726980512398.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_pipeline_en_5.5.0_3.0_1726980512398.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mnli_256_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|71.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_data_aug_mnli_256 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_en.md new file mode 100644 index 00000000000000..4c385b7afc0b9d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_en_5.5.0_3.0_1727020578387.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256_en_5.5.0_3.0_1727020578387.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_qqp_256| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|71.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_qqp_256 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_finetuned_leftarticles_mlm_epochier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_finetuned_leftarticles_mlm_epochier_pipeline_en.md new file mode 100644 index 00000000000000..e6f719c17bcf86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-distilroberta_base_finetuned_leftarticles_mlm_epochier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_finetuned_leftarticles_mlm_epochier_pipeline pipeline RoBertaEmbeddings from happybusinessperson +author: John Snow Labs +name: distilroberta_base_finetuned_leftarticles_mlm_epochier_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_leftarticles_mlm_epochier_pipeline` is a English model originally trained by happybusinessperson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_leftarticles_mlm_epochier_pipeline_en_5.5.0_3.0_1726999940632.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_leftarticles_mlm_epochier_pipeline_en_5.5.0_3.0_1726999940632.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_finetuned_leftarticles_mlm_epochier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_finetuned_leftarticles_mlm_epochier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_leftarticles_mlm_epochier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/happybusinessperson/distilroberta-base-finetuned-leftarticles-mlm-epochier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-drug_ner_cat_v1_ca.md b/docs/_posts/ahmedlone127/2024-09-22-drug_ner_cat_v1_ca.md new file mode 100644 index 00000000000000..ae1af2d356094b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-drug_ner_cat_v1_ca.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Catalan, Valencian drug_ner_cat_v1 RoBertaForTokenClassification from BSC-NLP4BIA +author: John Snow Labs +name: drug_ner_cat_v1 +date: 2024-09-22 +tags: [ca, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: ca +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`drug_ner_cat_v1` is a Catalan, Valencian model originally trained by BSC-NLP4BIA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/drug_ner_cat_v1_ca_5.5.0_3.0_1727048485082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/drug_ner_cat_v1_ca_5.5.0_3.0_1727048485082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("drug_ner_cat_v1","ca") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("drug_ner_cat_v1", "ca") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|drug_ner_cat_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|ca| +|Size:|436.0 MB| + +## References + +https://huggingface.co/BSC-NLP4BIA/drug-ner-cat-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-elatable_lp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-elatable_lp_pipeline_en.md new file mode 100644 index 00000000000000..488014362f2d1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-elatable_lp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English elatable_lp_pipeline pipeline DistilBertForSequenceClassification from gaborcselle +author: John Snow Labs +name: elatable_lp_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`elatable_lp_pipeline` is a English model originally trained by gaborcselle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/elatable_lp_pipeline_en_5.5.0_3.0_1726980596925.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/elatable_lp_pipeline_en_5.5.0_3.0_1726980596925.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("elatable_lp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("elatable_lp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|elatable_lp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/gaborcselle/elatable-lp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-fakenews_roberta_large_grad_en.md b/docs/_posts/ahmedlone127/2024-09-22-fakenews_roberta_large_grad_en.md new file mode 100644 index 00000000000000..695cfb565c820c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-fakenews_roberta_large_grad_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fakenews_roberta_large_grad RoBertaForSequenceClassification from Denyol +author: John Snow Labs +name: fakenews_roberta_large_grad +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fakenews_roberta_large_grad` is a English model originally trained by Denyol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fakenews_roberta_large_grad_en_5.5.0_3.0_1727037590807.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fakenews_roberta_large_grad_en_5.5.0_3.0_1727037590807.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("fakenews_roberta_large_grad","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("fakenews_roberta_large_grad", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fakenews_roberta_large_grad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Denyol/FakeNews-roberta-large-grad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-films_hate_offensive_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-films_hate_offensive_roberta_pipeline_en.md new file mode 100644 index 00000000000000..842c2a3b8e8fb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-films_hate_offensive_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English films_hate_offensive_roberta_pipeline pipeline RoBertaForSequenceClassification from esmarquez17 +author: John Snow Labs +name: films_hate_offensive_roberta_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`films_hate_offensive_roberta_pipeline` is a English model originally trained by esmarquez17. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/films_hate_offensive_roberta_pipeline_en_5.5.0_3.0_1726972235315.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/films_hate_offensive_roberta_pipeline_en_5.5.0_3.0_1726972235315.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("films_hate_offensive_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("films_hate_offensive_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|films_hate_offensive_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|448.5 MB| + +## References + +https://huggingface.co/esmarquez17/films-hate-offensive-roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-final_model_thebisso09_en.md b/docs/_posts/ahmedlone127/2024-09-22-final_model_thebisso09_en.md new file mode 100644 index 00000000000000..10f2a20641daf3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-final_model_thebisso09_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English final_model_thebisso09 DistilBertForSequenceClassification from Thebisso09 +author: John Snow Labs +name: final_model_thebisso09 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`final_model_thebisso09` is a English model originally trained by Thebisso09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/final_model_thebisso09_en_5.5.0_3.0_1727033708824.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/final_model_thebisso09_en_5.5.0_3.0_1727033708824.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("final_model_thebisso09","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("final_model_thebisso09", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|final_model_thebisso09| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Thebisso09/final_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetune_distilbert_sst_avalinguo_fluency_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetune_distilbert_sst_avalinguo_fluency_pipeline_en.md new file mode 100644 index 00000000000000..2b6a0fe2b6aec7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetune_distilbert_sst_avalinguo_fluency_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetune_distilbert_sst_avalinguo_fluency_pipeline pipeline DistilBertForSequenceClassification from papasega +author: John Snow Labs +name: finetune_distilbert_sst_avalinguo_fluency_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetune_distilbert_sst_avalinguo_fluency_pipeline` is a English model originally trained by papasega. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetune_distilbert_sst_avalinguo_fluency_pipeline_en_5.5.0_3.0_1726980011756.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetune_distilbert_sst_avalinguo_fluency_pipeline_en_5.5.0_3.0_1726980011756.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetune_distilbert_sst_avalinguo_fluency_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetune_distilbert_sst_avalinguo_fluency_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetune_distilbert_sst_avalinguo_fluency_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/papasega/finetune_Distilbert_SST_Avalinguo_Fluency + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuned_demo_2x_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuned_demo_2x_pipeline_en.md new file mode 100644 index 00000000000000..34cd4be0125601 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuned_demo_2x_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_demo_2x_pipeline pipeline DistilBertForSequenceClassification from nardellu +author: John Snow Labs +name: finetuned_demo_2x_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_demo_2x_pipeline` is a English model originally trained by nardellu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_demo_2x_pipeline_en_5.5.0_3.0_1727020669354.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_demo_2x_pipeline_en_5.5.0_3.0_1727020669354.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_demo_2x_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_demo_2x_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_demo_2x_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/nardellu/finetuned_demo_2X + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_ammarasmro_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_ammarasmro_pipeline_en.md new file mode 100644 index 00000000000000..9e471f0e9512fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_3000_samples_ammarasmro_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ammarasmro_pipeline pipeline DistilBertForSequenceClassification from ammarasmro +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ammarasmro_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ammarasmro_pipeline` is a English model originally trained by ammarasmro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ammarasmro_pipeline_en_5.5.0_3.0_1727012809317.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ammarasmro_pipeline_en_5.5.0_3.0_1727012809317.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_ammarasmro_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_ammarasmro_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ammarasmro_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ammarasmro/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_qiuxuan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_qiuxuan_pipeline_en.md new file mode 100644 index 00000000000000..a2e42faff1aae5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-finetuning_sentiment_model_qiuxuan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_qiuxuan_pipeline pipeline DistilBertForSequenceClassification from Qiuxuan +author: John Snow Labs +name: finetuning_sentiment_model_qiuxuan_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_qiuxuan_pipeline` is a English model originally trained by Qiuxuan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_qiuxuan_pipeline_en_5.5.0_3.0_1726980567325.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_qiuxuan_pipeline_en_5.5.0_3.0_1726980567325.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_qiuxuan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_qiuxuan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_qiuxuan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Qiuxuan/finetuning-sentiment-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-frugalscore_medium_roberta_bert_score_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-frugalscore_medium_roberta_bert_score_pipeline_en.md new file mode 100644 index 00000000000000..bc3519dd126394 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-frugalscore_medium_roberta_bert_score_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English frugalscore_medium_roberta_bert_score_pipeline pipeline BertForSequenceClassification from moussaKam +author: John Snow Labs +name: frugalscore_medium_roberta_bert_score_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`frugalscore_medium_roberta_bert_score_pipeline` is a English model originally trained by moussaKam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/frugalscore_medium_roberta_bert_score_pipeline_en_5.5.0_3.0_1727034489087.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/frugalscore_medium_roberta_bert_score_pipeline_en_5.5.0_3.0_1727034489087.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("frugalscore_medium_roberta_bert_score_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("frugalscore_medium_roberta_bert_score_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|frugalscore_medium_roberta_bert_score_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|155.2 MB| + +## References + +https://huggingface.co/moussaKam/frugalscore_medium_roberta_bert-score + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hate_hate_random2_seed2_twitter_roberta_large_2022_154m_en.md b/docs/_posts/ahmedlone127/2024-09-22-hate_hate_random2_seed2_twitter_roberta_large_2022_154m_en.md new file mode 100644 index 00000000000000..6981e8b1e9d146 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hate_hate_random2_seed2_twitter_roberta_large_2022_154m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_random2_seed2_twitter_roberta_large_2022_154m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_random2_seed2_twitter_roberta_large_2022_154m +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_random2_seed2_twitter_roberta_large_2022_154m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_random2_seed2_twitter_roberta_large_2022_154m_en_5.5.0_3.0_1727027417241.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_random2_seed2_twitter_roberta_large_2022_154m_en_5.5.0_3.0_1727027417241.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_random2_seed2_twitter_roberta_large_2022_154m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_random2_seed2_twitter_roberta_large_2022_154m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_random2_seed2_twitter_roberta_large_2022_154m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_random2_seed2-twitter-roberta-large-2022-154m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hw01_acezkevinz_en.md b/docs/_posts/ahmedlone127/2024-09-22-hw01_acezkevinz_en.md new file mode 100644 index 00000000000000..58fd80985a94ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hw01_acezkevinz_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hw01_acezkevinz DistilBertForSequenceClassification from AcEzKeViNz +author: John Snow Labs +name: hw01_acezkevinz +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hw01_acezkevinz` is a English model originally trained by AcEzKeViNz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hw01_acezkevinz_en_5.5.0_3.0_1727033596874.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hw01_acezkevinz_en_5.5.0_3.0_1727033596874.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("hw01_acezkevinz","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("hw01_acezkevinz", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hw01_acezkevinz| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AcEzKeViNz/HW01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-hw01_acezkevinz_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-hw01_acezkevinz_pipeline_en.md new file mode 100644 index 00000000000000..0d7433a3c83e33 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-hw01_acezkevinz_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hw01_acezkevinz_pipeline pipeline DistilBertForSequenceClassification from AcEzKeViNz +author: John Snow Labs +name: hw01_acezkevinz_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hw01_acezkevinz_pipeline` is a English model originally trained by AcEzKeViNz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hw01_acezkevinz_pipeline_en_5.5.0_3.0_1727033618897.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hw01_acezkevinz_pipeline_en_5.5.0_3.0_1727033618897.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hw01_acezkevinz_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hw01_acezkevinz_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hw01_acezkevinz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AcEzKeViNz/HW01 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-imdbreviews_classification_roberta_v02_clf_finetuning_en.md b/docs/_posts/ahmedlone127/2024-09-22-imdbreviews_classification_roberta_v02_clf_finetuning_en.md new file mode 100644 index 00000000000000..3a4a5b2dfba5b7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-imdbreviews_classification_roberta_v02_clf_finetuning_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English imdbreviews_classification_roberta_v02_clf_finetuning RoBertaForSequenceClassification from darmendarizp +author: John Snow Labs +name: imdbreviews_classification_roberta_v02_clf_finetuning +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`imdbreviews_classification_roberta_v02_clf_finetuning` is a English model originally trained by darmendarizp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_roberta_v02_clf_finetuning_en_5.5.0_3.0_1727017298269.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/imdbreviews_classification_roberta_v02_clf_finetuning_en_5.5.0_3.0_1727017298269.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("imdbreviews_classification_roberta_v02_clf_finetuning","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("imdbreviews_classification_roberta_v02_clf_finetuning", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|imdbreviews_classification_roberta_v02_clf_finetuning| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/darmendarizp/imdbreviews_classification_roberta_v02_clf_finetuning \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-insta_sentiment_distill_roberta_custom_data_en.md b/docs/_posts/ahmedlone127/2024-09-22-insta_sentiment_distill_roberta_custom_data_en.md new file mode 100644 index 00000000000000..038f1cc1ddd5a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-insta_sentiment_distill_roberta_custom_data_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English insta_sentiment_distill_roberta_custom_data RoBertaForSequenceClassification from davin45 +author: John Snow Labs +name: insta_sentiment_distill_roberta_custom_data +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`insta_sentiment_distill_roberta_custom_data` is a English model originally trained by davin45. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/insta_sentiment_distill_roberta_custom_data_en_5.5.0_3.0_1727037296932.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/insta_sentiment_distill_roberta_custom_data_en_5.5.0_3.0_1727037296932.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("insta_sentiment_distill_roberta_custom_data","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("insta_sentiment_distill_roberta_custom_data", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|insta_sentiment_distill_roberta_custom_data| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/davin45/insta-sentiment-distill-roberta-custom_data \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-insta_sentiment_distill_roberta_custom_data_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-insta_sentiment_distill_roberta_custom_data_pipeline_en.md new file mode 100644 index 00000000000000..f3ed73a8ca83ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-insta_sentiment_distill_roberta_custom_data_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English insta_sentiment_distill_roberta_custom_data_pipeline pipeline RoBertaForSequenceClassification from davin45 +author: John Snow Labs +name: insta_sentiment_distill_roberta_custom_data_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`insta_sentiment_distill_roberta_custom_data_pipeline` is a English model originally trained by davin45. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/insta_sentiment_distill_roberta_custom_data_pipeline_en_5.5.0_3.0_1727037316796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/insta_sentiment_distill_roberta_custom_data_pipeline_en_5.5.0_3.0_1727037316796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("insta_sentiment_distill_roberta_custom_data_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("insta_sentiment_distill_roberta_custom_data_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|insta_sentiment_distill_roberta_custom_data_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/davin45/insta-sentiment-distill-roberta-custom_data + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-javanese_bert_small_imdb_classifier_pipeline_jv.md b/docs/_posts/ahmedlone127/2024-09-22-javanese_bert_small_imdb_classifier_pipeline_jv.md new file mode 100644 index 00000000000000..e9b581cb1aa115 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-javanese_bert_small_imdb_classifier_pipeline_jv.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Javanese javanese_bert_small_imdb_classifier_pipeline pipeline BertForSequenceClassification from w11wo +author: John Snow Labs +name: javanese_bert_small_imdb_classifier_pipeline +date: 2024-09-22 +tags: [jv, open_source, pipeline, onnx] +task: Text Classification +language: jv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`javanese_bert_small_imdb_classifier_pipeline` is a Javanese model originally trained by w11wo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/javanese_bert_small_imdb_classifier_pipeline_jv_5.5.0_3.0_1727032178801.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/javanese_bert_small_imdb_classifier_pipeline_jv_5.5.0_3.0_1727032178801.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("javanese_bert_small_imdb_classifier_pipeline", lang = "jv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("javanese_bert_small_imdb_classifier_pipeline", lang = "jv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|javanese_bert_small_imdb_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|jv| +|Size:|409.5 MB| + +## References + +https://huggingface.co/w11wo/javanese-bert-small-imdb-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-jobclassifier_v2_en.md b/docs/_posts/ahmedlone127/2024-09-22-jobclassifier_v2_en.md new file mode 100644 index 00000000000000..da2ee5a441db8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-jobclassifier_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English jobclassifier_v2 BertForSequenceClassification from CleveGreen +author: John Snow Labs +name: jobclassifier_v2 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jobclassifier_v2` is a English model originally trained by CleveGreen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jobclassifier_v2_en_5.5.0_3.0_1727030597140.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jobclassifier_v2_en_5.5.0_3.0_1727030597140.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("jobclassifier_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("jobclassifier_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jobclassifier_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|406.3 MB| + +## References + +https://huggingface.co/CleveGreen/JobClassifier_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-llm_project_en.md b/docs/_posts/ahmedlone127/2024-09-22-llm_project_en.md new file mode 100644 index 00000000000000..1447110116642c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-llm_project_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English llm_project DistilBertForSequenceClassification from ThuyTran102 +author: John Snow Labs +name: llm_project +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`llm_project` is a English model originally trained by ThuyTran102. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/llm_project_en_5.5.0_3.0_1727020559412.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/llm_project_en_5.5.0_3.0_1727020559412.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("llm_project","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("llm_project", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|llm_project| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ThuyTran102/LLM_project \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-mmarco_mminilmv2_l12_h384_v1_en.md b/docs/_posts/ahmedlone127/2024-09-22-mmarco_mminilmv2_l12_h384_v1_en.md new file mode 100644 index 00000000000000..7b26f9012a9a86 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-mmarco_mminilmv2_l12_h384_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mmarco_mminilmv2_l12_h384_v1 XlmRoBertaForSequenceClassification from lpsantao +author: John Snow Labs +name: mmarco_mminilmv2_l12_h384_v1 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mmarco_mminilmv2_l12_h384_v1` is a English model originally trained by lpsantao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mmarco_mminilmv2_l12_h384_v1_en_5.5.0_3.0_1727009949173.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mmarco_mminilmv2_l12_h384_v1_en_5.5.0_3.0_1727009949173.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("mmarco_mminilmv2_l12_h384_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("mmarco_mminilmv2_l12_h384_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mmarco_mminilmv2_l12_h384_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|399.6 MB| + +## References + +https://huggingface.co/lpsantao/mmarco-mMiniLMv2-L12-H384-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-mmlu_physics_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-mmlu_physics_classifier_pipeline_en.md new file mode 100644 index 00000000000000..812ebbf6ef8718 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-mmlu_physics_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mmlu_physics_classifier_pipeline pipeline RoBertaForSequenceClassification from chrisliu298 +author: John Snow Labs +name: mmlu_physics_classifier_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mmlu_physics_classifier_pipeline` is a English model originally trained by chrisliu298. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mmlu_physics_classifier_pipeline_en_5.5.0_3.0_1727026612058.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mmlu_physics_classifier_pipeline_en_5.5.0_3.0_1727026612058.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mmlu_physics_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mmlu_physics_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mmlu_physics_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|439.3 MB| + +## References + +https://huggingface.co/chrisliu298/mmlu-physics_classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-model_2_7_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-model_2_7_pipeline_en.md new file mode 100644 index 00000000000000..d366bf1079b338 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-model_2_7_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_2_7_pipeline pipeline RoBertaForSequenceClassification from raydentseng +author: John Snow Labs +name: model_2_7_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_2_7_pipeline` is a English model originally trained by raydentseng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_2_7_pipeline_en_5.5.0_3.0_1727037770788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_2_7_pipeline_en_5.5.0_3.0_1727037770788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_2_7_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_2_7_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_2_7_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.0 MB| + +## References + +https://huggingface.co/raydentseng/model_2_7 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-model_sentence_entailment_hackaton_coliee_en.md b/docs/_posts/ahmedlone127/2024-09-22-model_sentence_entailment_hackaton_coliee_en.md new file mode 100644 index 00000000000000..481e674adbf445 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-model_sentence_entailment_hackaton_coliee_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_sentence_entailment_hackaton_coliee RoBertaForSequenceClassification from ludoviciarraga +author: John Snow Labs +name: model_sentence_entailment_hackaton_coliee +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_sentence_entailment_hackaton_coliee` is a English model originally trained by ludoviciarraga. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_sentence_entailment_hackaton_coliee_en_5.5.0_3.0_1726967623298.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_sentence_entailment_hackaton_coliee_en_5.5.0_3.0_1726967623298.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("model_sentence_entailment_hackaton_coliee","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("model_sentence_entailment_hackaton_coliee", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_sentence_entailment_hackaton_coliee| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ludoviciarraga/model_sentence_entailment_hackaton_coliee \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-model_token_classification_bert_base_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-model_token_classification_bert_base_ner_pipeline_en.md new file mode 100644 index 00000000000000..34367ac8ef3cae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-model_token_classification_bert_base_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English model_token_classification_bert_base_ner_pipeline pipeline BertForTokenClassification from Ornelas7 +author: John Snow Labs +name: model_token_classification_bert_base_ner_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_token_classification_bert_base_ner_pipeline` is a English model originally trained by Ornelas7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_token_classification_bert_base_ner_pipeline_en_5.5.0_3.0_1727045812995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_token_classification_bert_base_ner_pipeline_en_5.5.0_3.0_1727045812995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("model_token_classification_bert_base_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("model_token_classification_bert_base_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_token_classification_bert_base_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Ornelas7/model-token-classification-bert-base-NER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ner_serverstable_v0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-ner_serverstable_v0_pipeline_en.md new file mode 100644 index 00000000000000..26d12b940ab797 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ner_serverstable_v0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ner_serverstable_v0_pipeline pipeline BertForTokenClassification from procit002 +author: John Snow Labs +name: ner_serverstable_v0_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_serverstable_v0_pipeline` is a English model originally trained by procit002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_serverstable_v0_pipeline_en_5.5.0_3.0_1727045459554.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_serverstable_v0_pipeline_en_5.5.0_3.0_1727045459554.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ner_serverstable_v0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ner_serverstable_v0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_serverstable_v0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/procit002/NER_ServerStable_v0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-ptcrawl_plus_legal_base_v3_5__checkpoint_last_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-ptcrawl_plus_legal_base_v3_5__checkpoint_last_pipeline_en.md new file mode 100644 index 00000000000000..f185be0d85a346 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-ptcrawl_plus_legal_base_v3_5__checkpoint_last_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ptcrawl_plus_legal_base_v3_5__checkpoint_last_pipeline pipeline RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: ptcrawl_plus_legal_base_v3_5__checkpoint_last_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ptcrawl_plus_legal_base_v3_5__checkpoint_last_pipeline` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ptcrawl_plus_legal_base_v3_5__checkpoint_last_pipeline_en_5.5.0_3.0_1727041930558.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ptcrawl_plus_legal_base_v3_5__checkpoint_last_pipeline_en_5.5.0_3.0_1727041930558.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ptcrawl_plus_legal_base_v3_5__checkpoint_last_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ptcrawl_plus_legal_base_v3_5__checkpoint_last_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ptcrawl_plus_legal_base_v3_5__checkpoint_last_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|296.7 MB| + +## References + +https://huggingface.co/eduagarcia-temp/ptcrawl_plus_legal_base_v3_5__checkpoint_last + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_pipeline_en.md new file mode 100644 index 00000000000000..da77f3cf560b77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_pipeline pipeline RoBertaForSequenceClassification from pamelapaolacb +author: John Snow Labs +name: roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_pipeline` is a English model originally trained by pamelapaolacb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_pipeline_en_5.5.0_3.0_1726972331072.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_pipeline_en_5.5.0_3.0_1726972331072.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_jou_amazon_reviews_multi_pamelapaolacb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|446.8 MB| + +## References + +https://huggingface.co/pamelapaolacb/roberta-base-bne-jou-amazon_reviews_multi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_cased_finetuned_mnli_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_cased_finetuned_mnli_en.md new file mode 100644 index 00000000000000..cb1164c37af251 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_cased_finetuned_mnli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_cased_finetuned_mnli RoBertaForSequenceClassification from George-Ogden +author: John Snow Labs +name: roberta_base_cased_finetuned_mnli +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_cased_finetuned_mnli` is a English model originally trained by George-Ogden. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_cased_finetuned_mnli_en_5.5.0_3.0_1727017431827.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_cased_finetuned_mnli_en_5.5.0_3.0_1727017431827.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_cased_finetuned_mnli","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_cased_finetuned_mnli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_cased_finetuned_mnli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|462.5 MB| + +## References + +https://huggingface.co/George-Ogden/roberta-base-cased-finetuned-mnli \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_dutch_oscar23_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_dutch_oscar23_en.md new file mode 100644 index 00000000000000..d3f786cf65499d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_dutch_oscar23_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_dutch_oscar23 RoBertaEmbeddings from FremyCompany +author: John Snow Labs +name: roberta_base_dutch_oscar23 +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_dutch_oscar23` is a English model originally trained by FremyCompany. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_dutch_oscar23_en_5.5.0_3.0_1726999567115.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_dutch_oscar23_en_5.5.0_3.0_1726999567115.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_dutch_oscar23","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_dutch_oscar23","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_dutch_oscar23| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|464.8 MB| + +## References + +https://huggingface.co/FremyCompany/roberta-base-nl-oscar23 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_dutch_oscar23_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_dutch_oscar23_pipeline_en.md new file mode 100644 index 00000000000000..b86517e842f370 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_dutch_oscar23_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_dutch_oscar23_pipeline pipeline RoBertaEmbeddings from FremyCompany +author: John Snow Labs +name: roberta_base_dutch_oscar23_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_dutch_oscar23_pipeline` is a English model originally trained by FremyCompany. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_dutch_oscar23_pipeline_en_5.5.0_3.0_1726999587737.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_dutch_oscar23_pipeline_en_5.5.0_3.0_1726999587737.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_dutch_oscar23_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_dutch_oscar23_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_dutch_oscar23_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|464.8 MB| + +## References + +https://huggingface.co/FremyCompany/roberta-base-nl-oscar23 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_sleevelength_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_sleevelength_en.md new file mode 100644 index 00000000000000..01e7223dc93fee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_finetuned_sleevelength_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_finetuned_sleevelength RoBertaForSequenceClassification from Cournane +author: John Snow Labs +name: roberta_base_finetuned_sleevelength +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_sleevelength` is a English model originally trained by Cournane. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_sleevelength_en_5.5.0_3.0_1727036898412.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_sleevelength_en_5.5.0_3.0_1727036898412.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_sleevelength","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_finetuned_sleevelength", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_sleevelength| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|427.2 MB| + +## References + +https://huggingface.co/Cournane/roberta-base-finetuned-SleeveLength \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_oscar_chen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_oscar_chen_pipeline_en.md new file mode 100644 index 00000000000000..df39bf412024ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_oscar_chen_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_oscar_chen_pipeline pipeline RoBertaForSequenceClassification from Oscar-chen +author: John Snow Labs +name: roberta_base_oscar_chen_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_oscar_chen_pipeline` is a English model originally trained by Oscar-chen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_oscar_chen_pipeline_en_5.5.0_3.0_1726972198300.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_oscar_chen_pipeline_en_5.5.0_3.0_1726972198300.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_oscar_chen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_oscar_chen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_oscar_chen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|433.2 MB| + +## References + +https://huggingface.co/Oscar-chen/roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_reduced_upper_fabric_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_reduced_upper_fabric_en.md new file mode 100644 index 00000000000000..53466b04685b00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_reduced_upper_fabric_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_reduced_upper_fabric RoBertaForSequenceClassification from Cournane +author: John Snow Labs +name: roberta_base_reduced_upper_fabric +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_reduced_upper_fabric` is a English model originally trained by Cournane. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_reduced_upper_fabric_en_5.5.0_3.0_1727017731890.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_reduced_upper_fabric_en_5.5.0_3.0_1727017731890.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_reduced_upper_fabric","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_reduced_upper_fabric", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_reduced_upper_fabric| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|427.2 MB| + +## References + +https://huggingface.co/Cournane/roberta-base-reduced-Upper_fabric \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_tweet_topic_multi_2020_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_tweet_topic_multi_2020_pipeline_en.md new file mode 100644 index 00000000000000..a0feb6c5723d4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_tweet_topic_multi_2020_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_tweet_topic_multi_2020_pipeline pipeline RoBertaForSequenceClassification from cardiffnlp +author: John Snow Labs +name: roberta_base_tweet_topic_multi_2020_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_tweet_topic_multi_2020_pipeline` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_tweet_topic_multi_2020_pipeline_en_5.5.0_3.0_1726967470783.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_tweet_topic_multi_2020_pipeline_en_5.5.0_3.0_1726967470783.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_tweet_topic_multi_2020_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_tweet_topic_multi_2020_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_tweet_topic_multi_2020_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|441.7 MB| + +## References + +https://huggingface.co/cardiffnlp/roberta-base-tweet-topic-multi-2020 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_base_wechsel_ukrainian_uk.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_wechsel_ukrainian_uk.md new file mode 100644 index 00000000000000..dfb6cb293a48d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_base_wechsel_ukrainian_uk.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Ukrainian roberta_base_wechsel_ukrainian RoBertaEmbeddings from benjamin +author: John Snow Labs +name: roberta_base_wechsel_ukrainian +date: 2024-09-22 +tags: [uk, open_source, onnx, embeddings, roberta] +task: Embeddings +language: uk +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_wechsel_ukrainian` is a Ukrainian model originally trained by benjamin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_wechsel_ukrainian_uk_5.5.0_3.0_1727041915094.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_wechsel_ukrainian_uk_5.5.0_3.0_1727041915094.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_wechsel_ukrainian","uk") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_wechsel_ukrainian","uk") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_wechsel_ukrainian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|uk| +|Size:|465.9 MB| + +## References + +https://huggingface.co/benjamin/roberta-base-wechsel-ukrainian \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_ingredients_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_ingredients_en.md new file mode 100644 index 00000000000000..c67d3f0a8a6fa2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_ingredients_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_ingredients RoBertaEmbeddings from ggilley +author: John Snow Labs +name: roberta_ingredients +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_ingredients` is a English model originally trained by ggilley. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_ingredients_en_5.5.0_3.0_1727042083306.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_ingredients_en_5.5.0_3.0_1727042083306.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_ingredients","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_ingredients","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_ingredients| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/ggilley/roberta-ingredients \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-roberta_poetry_religion_crpo_en.md b/docs/_posts/ahmedlone127/2024-09-22-roberta_poetry_religion_crpo_en.md new file mode 100644 index 00000000000000..dbf78d50f7a29f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-roberta_poetry_religion_crpo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_poetry_religion_crpo RoBertaEmbeddings from andreipb +author: John Snow Labs +name: roberta_poetry_religion_crpo +date: 2024-09-22 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_poetry_religion_crpo` is a English model originally trained by andreipb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_poetry_religion_crpo_en_5.5.0_3.0_1726999704632.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_poetry_religion_crpo_en_5.5.0_3.0_1726999704632.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_poetry_religion_crpo","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_poetry_religion_crpo","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_poetry_religion_crpo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/andreipb/roberta-poetry-religion-crpo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_italian_cased_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_italian_cased_en.md new file mode 100644 index 00000000000000..6c7085501d9fb6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_english_italian_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_italian_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_italian_cased +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_italian_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_italian_cased_en_5.5.0_3.0_1727047180968.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_italian_cased_en_5.5.0_3.0_1727047180968.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_italian_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_italian_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_italian_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|417.9 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-it-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_bible_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_bible_pipeline_en.md new file mode 100644 index 00000000000000..d7fe75ea8923cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_bible_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_bible_pipeline pipeline BertSentenceEmbeddings from Pragash-Mohanarajah +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_bible_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_bible_pipeline` is a English model originally trained by Pragash-Mohanarajah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_bible_pipeline_en_5.5.0_3.0_1727001535467.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_bible_pipeline_en_5.5.0_3.0_1727001535467.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_bible_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_bible_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_bible_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/Pragash-Mohanarajah/bert-base-uncased-finetuned-bible + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_wikitext_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_wikitext_en.md new file mode 100644 index 00000000000000..b69fd87af36c00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bert_base_uncased_finetuned_wikitext_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_wikitext BertSentenceEmbeddings from peteryushunli +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_wikitext +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_wikitext` is a English model originally trained by peteryushunli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wikitext_en_5.5.0_3.0_1727047195540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wikitext_en_5.5.0_3.0_1727047195540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wikitext","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wikitext","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_wikitext| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/peteryushunli/bert-base-uncased-finetuned-wikitext \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sent_bibert_v0_1_en.md b/docs/_posts/ahmedlone127/2024-09-22-sent_bibert_v0_1_en.md new file mode 100644 index 00000000000000..3a43598efd9fcb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sent_bibert_v0_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bibert_v0_1 BertSentenceEmbeddings from yugen-ok +author: John Snow Labs +name: sent_bibert_v0_1 +date: 2024-09-22 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bibert_v0_1` is a English model originally trained by yugen-ok. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bibert_v0_1_en_5.5.0_3.0_1727044456673.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bibert_v0_1_en_5.5.0_3.0_1727044456673.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bibert_v0_1","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bibert_v0_1","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bibert_v0_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/yugen-ok/bibert-v0.1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_preetham04_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_preetham04_pipeline_en.md new file mode 100644 index 00000000000000..d01519373dc3ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_preetham04_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_preetham04_pipeline pipeline BertForSequenceClassification from Preetham04 +author: John Snow Labs +name: sentiment_analysis_preetham04_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_preetham04_pipeline` is a English model originally trained by Preetham04. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_preetham04_pipeline_en_5.5.0_3.0_1727034120954.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_preetham04_pipeline_en_5.5.0_3.0_1727034120954.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_preetham04_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_preetham04_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_preetham04_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Preetham04/sentiment-analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_shadez25_en.md b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_shadez25_en.md new file mode 100644 index 00000000000000..94d1cfe7216b45 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sentiment_analysis_shadez25_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_shadez25 DistilBertForSequenceClassification from Shadez25 +author: John Snow Labs +name: sentiment_analysis_shadez25 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_shadez25` is a English model originally trained by Shadez25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_shadez25_en_5.5.0_3.0_1727033492393.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_shadez25_en_5.5.0_3.0_1727033492393.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_shadez25","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_shadez25", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_shadez25| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Shadez25/sentiment-analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-sreegeni_finetune_textclass_auto_en.md b/docs/_posts/ahmedlone127/2024-09-22-sreegeni_finetune_textclass_auto_en.md new file mode 100644 index 00000000000000..0e2342d0b3fd42 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-sreegeni_finetune_textclass_auto_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sreegeni_finetune_textclass_auto RoBertaForSequenceClassification from sreerammadhu +author: John Snow Labs +name: sreegeni_finetune_textclass_auto +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sreegeni_finetune_textclass_auto` is a English model originally trained by sreerammadhu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sreegeni_finetune_textclass_auto_en_5.5.0_3.0_1727017243296.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sreegeni_finetune_textclass_auto_en_5.5.0_3.0_1727017243296.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sreegeni_finetune_textclass_auto","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sreegeni_finetune_textclass_auto", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sreegeni_finetune_textclass_auto| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/sreerammadhu/sreegeni-finetune-textclass-auto \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-stance_gottbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-stance_gottbert_pipeline_en.md new file mode 100644 index 00000000000000..41a3ac15f4f950 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-stance_gottbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stance_gottbert_pipeline pipeline RoBertaForSequenceClassification from ogoshi2000 +author: John Snow Labs +name: stance_gottbert_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stance_gottbert_pipeline` is a English model originally trained by ogoshi2000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stance_gottbert_pipeline_en_5.5.0_3.0_1726972007326.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stance_gottbert_pipeline_en_5.5.0_3.0_1726972007326.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stance_gottbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stance_gottbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stance_gottbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|472.9 MB| + +## References + +https://huggingface.co/ogoshi2000/stance-gottbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-strategytransitionplanv1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-strategytransitionplanv1_pipeline_en.md new file mode 100644 index 00000000000000..4d1e620e7e6cd4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-strategytransitionplanv1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English strategytransitionplanv1_pipeline pipeline RoBertaForSequenceClassification from lomov +author: John Snow Labs +name: strategytransitionplanv1_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`strategytransitionplanv1_pipeline` is a English model originally trained by lomov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/strategytransitionplanv1_pipeline_en_5.5.0_3.0_1727017370066.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/strategytransitionplanv1_pipeline_en_5.5.0_3.0_1727017370066.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("strategytransitionplanv1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("strategytransitionplanv1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|strategytransitionplanv1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.7 MB| + +## References + +https://huggingface.co/lomov/strategytransitionplanv1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_en.md b/docs/_posts/ahmedlone127/2024-09-22-stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_en.md new file mode 100644 index 00000000000000..247006eee7f521 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stsb_tinybert_l_4_finetuned_auc_151221_top3_op2 BertForSequenceClassification from Katsiaryna +author: John Snow Labs +name: stsb_tinybert_l_4_finetuned_auc_151221_top3_op2 +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stsb_tinybert_l_4_finetuned_auc_151221_top3_op2` is a English model originally trained by Katsiaryna. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_en_5.5.0_3.0_1727034717612.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stsb_tinybert_l_4_finetuned_auc_151221_top3_op2_en_5.5.0_3.0_1727034717612.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("stsb_tinybert_l_4_finetuned_auc_151221_top3_op2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("stsb_tinybert_l_4_finetuned_auc_151221_top3_op2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stsb_tinybert_l_4_finetuned_auc_151221_top3_op2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|54.2 MB| + +## References + +https://huggingface.co/Katsiaryna/stsb-TinyBERT-L-4-finetuned_auc_151221-top3_op2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-suicide_distilbert_2_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-suicide_distilbert_2_4_pipeline_en.md new file mode 100644 index 00000000000000..a95eb6227f8b83 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-suicide_distilbert_2_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English suicide_distilbert_2_4_pipeline pipeline DistilBertForSequenceClassification from cuadron11 +author: John Snow Labs +name: suicide_distilbert_2_4_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`suicide_distilbert_2_4_pipeline` is a English model originally trained by cuadron11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/suicide_distilbert_2_4_pipeline_en_5.5.0_3.0_1727012661672.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/suicide_distilbert_2_4_pipeline_en_5.5.0_3.0_1727012661672.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("suicide_distilbert_2_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("suicide_distilbert_2_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|suicide_distilbert_2_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cuadron11/suicide-distilbert-2-4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-swot_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-22-swot_classifier_en.md new file mode 100644 index 00000000000000..6094792ab197b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-swot_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English swot_classifier DistilBertForSequenceClassification from jcaponigro +author: John Snow Labs +name: swot_classifier +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`swot_classifier` is a English model originally trained by jcaponigro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/swot_classifier_en_5.5.0_3.0_1727012230273.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/swot_classifier_en_5.5.0_3.0_1727012230273.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("swot_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("swot_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|swot_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jcaponigro/SWOT_Classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-test_trainer11a_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-test_trainer11a_pipeline_en.md new file mode 100644 index 00000000000000..b5d62145c5b19d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-test_trainer11a_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_trainer11a_pipeline pipeline DistilBertForSequenceClassification from SimoneJLaudani +author: John Snow Labs +name: test_trainer11a_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_trainer11a_pipeline` is a English model originally trained by SimoneJLaudani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_trainer11a_pipeline_en_5.5.0_3.0_1727020407533.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_trainer11a_pipeline_en_5.5.0_3.0_1727020407533.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_trainer11a_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_trainer11a_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_trainer11a_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/SimoneJLaudani/test_trainer11a + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-trainer2b_en.md b/docs/_posts/ahmedlone127/2024-09-22-trainer2b_en.md new file mode 100644 index 00000000000000..8d211950aaad8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-trainer2b_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trainer2b DistilBertForSequenceClassification from SimoneJLaudani +author: John Snow Labs +name: trainer2b +date: 2024-09-22 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trainer2b` is a English model originally trained by SimoneJLaudani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trainer2b_en_5.5.0_3.0_1726980109846.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trainer2b_en_5.5.0_3.0_1726980109846.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("trainer2b","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("trainer2b", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trainer2b| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/SimoneJLaudani/trainer2b \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-twitter_roberta_base_3epoch10_64_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-twitter_roberta_base_3epoch10_64_pipeline_en.md new file mode 100644 index 00000000000000..ea54e7b0206a81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-twitter_roberta_base_3epoch10_64_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitter_roberta_base_3epoch10_64_pipeline pipeline RoBertaForSequenceClassification from dianamihalache27 +author: John Snow Labs +name: twitter_roberta_base_3epoch10_64_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_3epoch10_64_pipeline` is a English model originally trained by dianamihalache27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_3epoch10_64_pipeline_en_5.5.0_3.0_1727037019441.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_3epoch10_64_pipeline_en_5.5.0_3.0_1727037019441.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitter_roberta_base_3epoch10_64_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitter_roberta_base_3epoch10_64_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_3epoch10_64_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/dianamihalache27/twitter-roberta-base_3epoch10.64 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_ai_nose_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_ai_nose_en.md new file mode 100644 index 00000000000000..3e28a0723c44a0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_ai_nose_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_ai_nose WhisperForCTC from susmitabhatt +author: John Snow Labs +name: whisper_ai_nose +date: 2024-09-22 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_ai_nose` is a English model originally trained by susmitabhatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_ai_nose_en_5.5.0_3.0_1727022388766.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_ai_nose_en_5.5.0_3.0_1727022388766.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_ai_nose","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_ai_nose", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_ai_nose| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/susmitabhatt/whisper-ai-nose \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_base_ga2en_v1_1_pipeline_ga.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_base_ga2en_v1_1_pipeline_ga.md new file mode 100644 index 00000000000000..05a847f3b19073 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_base_ga2en_v1_1_pipeline_ga.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Irish whisper_base_ga2en_v1_1_pipeline pipeline WhisperForCTC from ymoslem +author: John Snow Labs +name: whisper_base_ga2en_v1_1_pipeline +date: 2024-09-22 +tags: [ga, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ga +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_ga2en_v1_1_pipeline` is a Irish model originally trained by ymoslem. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_ga2en_v1_1_pipeline_ga_5.5.0_3.0_1727024753949.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_ga2en_v1_1_pipeline_ga_5.5.0_3.0_1727024753949.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_ga2en_v1_1_pipeline", lang = "ga") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_ga2en_v1_1_pipeline", lang = "ga") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_ga2en_v1_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ga| +|Size:|641.2 MB| + +## References + +https://huggingface.co/ymoslem/whisper-base-ga2en-v1.1 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_atc_san2003m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_atc_san2003m_pipeline_en.md new file mode 100644 index 00000000000000..47622ed9204855 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_atc_san2003m_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_atc_san2003m_pipeline pipeline WhisperForCTC from san2003m +author: John Snow Labs +name: whisper_small_atc_san2003m_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_atc_san2003m_pipeline` is a English model originally trained by san2003m. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_atc_san2003m_pipeline_en_5.5.0_3.0_1726983388523.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_atc_san2003m_pipeline_en_5.5.0_3.0_1726983388523.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_atc_san2003m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_atc_san2003m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_atc_san2003m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/san2003m/whisper-small-atc + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_small_hindi_srirama_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_hindi_srirama_pipeline_en.md new file mode 100644 index 00000000000000..d4089dd724d260 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_small_hindi_srirama_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_hindi_srirama_pipeline pipeline WhisperForCTC from srirama +author: John Snow Labs +name: whisper_small_hindi_srirama_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_hindi_srirama_pipeline` is a English model originally trained by srirama. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_srirama_pipeline_en_5.5.0_3.0_1727024625869.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_hindi_srirama_pipeline_en_5.5.0_3.0_1727024625869.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_hindi_srirama_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_hindi_srirama_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_hindi_srirama_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/srirama/whisper-small-hi + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_english_us_ghassenhannachi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_english_us_ghassenhannachi_pipeline_en.md new file mode 100644 index 00000000000000..130bc1b6d929d4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-whisper_tiny_minds14_english_us_ghassenhannachi_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_minds14_english_us_ghassenhannachi_pipeline pipeline WhisperForCTC from ghassenhannachi +author: John Snow Labs +name: whisper_tiny_minds14_english_us_ghassenhannachi_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_english_us_ghassenhannachi_pipeline` is a English model originally trained by ghassenhannachi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_us_ghassenhannachi_pipeline_en_5.5.0_3.0_1726995013770.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_us_ghassenhannachi_pipeline_en_5.5.0_3.0_1726995013770.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_minds14_english_us_ghassenhannachi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_minds14_english_us_ghassenhannachi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_english_us_ghassenhannachi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.0 MB| + +## References + +https://huggingface.co/ghassenhannachi/whisper-tiny-minds14-en-us + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_gewissta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_gewissta_pipeline_en.md new file mode 100644 index 00000000000000..0960b73fe6018a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_finetuned_panx_german_french_gewissta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_gewissta_pipeline pipeline XlmRoBertaForTokenClassification from gewissta +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_gewissta_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_gewissta_pipeline` is a English model originally trained by gewissta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_gewissta_pipeline_en_5.5.0_3.0_1727019360738.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_gewissta_pipeline_en_5.5.0_3.0_1727019360738.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_gewissta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_gewissta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_gewissta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/gewissta/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_en.md new file mode 100644 index 00000000000000..75806be067eda2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_ft_udpos213_top8lang_southern_sotho XlmRoBertaForTokenClassification from iceman2434 +author: John Snow Labs +name: xlm_roberta_base_ft_udpos213_top8lang_southern_sotho +date: 2024-09-22 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_ft_udpos213_top8lang_southern_sotho` is a English model originally trained by iceman2434. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_en_5.5.0_3.0_1727019173465.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_ft_udpos213_top8lang_southern_sotho_en_5.5.0_3.0_1727019173465.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_ft_udpos213_top8lang_southern_sotho","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_ft_udpos213_top8lang_southern_sotho", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_ft_udpos213_top8lang_southern_sotho| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|805.3 MB| + +## References + +https://huggingface.co/iceman2434/xlm-roberta-base_ft_udpos213-top8lang-st \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_pipeline_en.md new file mode 100644 index 00000000000000..49f8ad2507d7cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-22-xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_pipeline pipeline XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_pipeline +date: 2024-09-22 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_pipeline` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_pipeline_en_5.5.0_3.0_1727010189358.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_pipeline_en_5.5.0_3.0_1727010189358.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr2e_05_seed42_kinyarwanda_hau_eng_train_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|800.8 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr2e-05_seed42_kin-hau-eng_train + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-1030_en.md b/docs/_posts/ahmedlone127/2024-09-23-1030_en.md new file mode 100644 index 00000000000000..d53e7fb06d91b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-1030_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 1030 DistilBertForSequenceClassification from tingchih +author: John Snow Labs +name: 1030 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`1030` is a English model originally trained by tingchih. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/1030_en_5.5.0_3.0_1727108743974.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/1030_en_5.5.0_3.0_1727108743974.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("1030","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("1030", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|1030| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tingchih/1030 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-2020_q1_25p_filtered_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-2020_q1_25p_filtered_pipeline_en.md new file mode 100644 index 00000000000000..8d2d9036a4fc76 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-2020_q1_25p_filtered_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2020_q1_25p_filtered_pipeline pipeline RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q1_25p_filtered_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q1_25p_filtered_pipeline` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q1_25p_filtered_pipeline_en_5.5.0_3.0_1727121920175.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q1_25p_filtered_pipeline_en_5.5.0_3.0_1727121920175.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2020_q1_25p_filtered_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2020_q1_25p_filtered_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q1_25p_filtered_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q1-25p-filtered + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-3_roberta_0_en.md b/docs/_posts/ahmedlone127/2024-09-23-3_roberta_0_en.md new file mode 100644 index 00000000000000..01923a1a992aa1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-3_roberta_0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 3_roberta_0 RoBertaForSequenceClassification from prl90777 +author: John Snow Labs +name: 3_roberta_0 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`3_roberta_0` is a English model originally trained by prl90777. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/3_roberta_0_en_5.5.0_3.0_1727055510163.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/3_roberta_0_en_5.5.0_3.0_1727055510163.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("3_roberta_0","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("3_roberta_0", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|3_roberta_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|449.2 MB| + +## References + +https://huggingface.co/prl90777/3_roberta_0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-aia_hw01_qian_wu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-aia_hw01_qian_wu_pipeline_en.md new file mode 100644 index 00000000000000..9ba8c13c5c6016 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-aia_hw01_qian_wu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English aia_hw01_qian_wu_pipeline pipeline DistilBertForSequenceClassification from Qian-Wu +author: John Snow Labs +name: aia_hw01_qian_wu_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`aia_hw01_qian_wu_pipeline` is a English model originally trained by Qian-Wu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/aia_hw01_qian_wu_pipeline_en_5.5.0_3.0_1727097132445.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/aia_hw01_qian_wu_pipeline_en_5.5.0_3.0_1727097132445.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("aia_hw01_qian_wu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("aia_hw01_qian_wu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|aia_hw01_qian_wu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Qian-Wu/AIA_HW01 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-amazon_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-amazon_0_pipeline_en.md new file mode 100644 index 00000000000000..c7150c41c5ae15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-amazon_0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English amazon_0_pipeline pipeline DistilBertForSequenceClassification from draghicivlad +author: John Snow Labs +name: amazon_0_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amazon_0_pipeline` is a English model originally trained by draghicivlad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amazon_0_pipeline_en_5.5.0_3.0_1727108757578.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amazon_0_pipeline_en_5.5.0_3.0_1727108757578.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("amazon_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("amazon_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amazon_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/draghicivlad/amazon_0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-apps2_en.md b/docs/_posts/ahmedlone127/2024-09-23-apps2_en.md new file mode 100644 index 00000000000000..0673555019a324 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-apps2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English apps2 DistilBertForSequenceClassification from Frana9812 +author: John Snow Labs +name: apps2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`apps2` is a English model originally trained by Frana9812. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/apps2_en_5.5.0_3.0_1727094073990.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/apps2_en_5.5.0_3.0_1727094073990.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("apps2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("apps2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|apps2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Frana9812/apps2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-araroberta_luxembourgish_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-23-araroberta_luxembourgish_pipeline_ar.md new file mode 100644 index 00000000000000..4729346d1ca87c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-araroberta_luxembourgish_pipeline_ar.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Arabic araroberta_luxembourgish_pipeline pipeline RoBertaEmbeddings from reemalyami +author: John Snow Labs +name: araroberta_luxembourgish_pipeline +date: 2024-09-23 +tags: [ar, open_source, pipeline, onnx] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`araroberta_luxembourgish_pipeline` is a Arabic model originally trained by reemalyami. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/araroberta_luxembourgish_pipeline_ar_5.5.0_3.0_1727121682180.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/araroberta_luxembourgish_pipeline_ar_5.5.0_3.0_1727121682180.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("araroberta_luxembourgish_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("araroberta_luxembourgish_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|araroberta_luxembourgish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|470.6 MB| + +## References + +https://huggingface.co/reemalyami/AraRoBERTa-LB + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_fake_news_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_fake_news_pipeline_en.md new file mode 100644 index 00000000000000..d16897c64768b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_cased_fake_news_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_cased_fake_news_pipeline pipeline BertForSequenceClassification from elozano +author: John Snow Labs +name: bert_base_cased_fake_news_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_fake_news_pipeline` is a English model originally trained by elozano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_fake_news_pipeline_en_5.5.0_3.0_1727095308660.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_fake_news_pipeline_en_5.5.0_3.0_1727095308660.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_fake_news_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_fake_news_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_fake_news_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/elozano/bert-base-cased-fake-news + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_multilingual_cased_ner_hrl_nttaii_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_multilingual_cased_ner_hrl_nttaii_pipeline_xx.md new file mode 100644 index 00000000000000..317f43844b4ed5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_multilingual_cased_ner_hrl_nttaii_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_ner_hrl_nttaii_pipeline pipeline BertForTokenClassification from nttaii +author: John Snow Labs +name: bert_base_multilingual_cased_ner_hrl_nttaii_pipeline +date: 2024-09-23 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_ner_hrl_nttaii_pipeline` is a Multilingual model originally trained by nttaii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_ner_hrl_nttaii_pipeline_xx_5.5.0_3.0_1727060451951.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_ner_hrl_nttaii_pipeline_xx_5.5.0_3.0_1727060451951.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_cased_ner_hrl_nttaii_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_cased_ner_hrl_nttaii_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_ner_hrl_nttaii_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.3 MB| + +## References + +https://huggingface.co/nttaii/bert-base-multilingual-cased-ner-hrl + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_pipeline_en.md new file mode 100644 index 00000000000000..ea310d5409a402 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_pipeline_en_5.5.0_3.0_1727127906219.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_pipeline_en_5.5.0_3.0_1727127906219.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_220240904191111_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.220240904191111 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_en.md new file mode 100644 index 00000000000000..b346c78565324e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0 +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727127747505.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727127747505.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_45_b_32_lr_1_2e_06_dp_0_3_swati_300_southern_sotho_false_fh_true_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.45-b-32-lr-1.2e-06-dp-0.3-ss-300-st-False-fh-True-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_pipeline_en.md new file mode 100644 index 00000000000000..c73448344b1267 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_pipeline_en_5.5.0_3.0_1727050091038.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_pipeline_en_5.5.0_3.0_1727050091038.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_2_swati_2882_southern_sotho_false_fh_true_hs_666_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-1e-06-wd-0.001-dp-0.2-ss-2882-st-False-fh-True-hs-666 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_pipeline_en.md new file mode 100644 index 00000000000000..068ea65942c59e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_pipeline_en_5.5.0_3.0_1727049778187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_pipeline_en_5.5.0_3.0_1727049778187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_8e_06_wd_0_001_dp_0_999_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-8e-06-wd-0.001-dp-0.999 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_newscategoryclassification_fullmodel_3_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_newscategoryclassification_fullmodel_3_en.md new file mode 100644 index 00000000000000..fd1ebb3c205d54 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_base_uncased_newscategoryclassification_fullmodel_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_newscategoryclassification_fullmodel_3 DistilBertForSequenceClassification from akashmaggon +author: John Snow Labs +name: bert_base_uncased_newscategoryclassification_fullmodel_3 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_newscategoryclassification_fullmodel_3` is a English model originally trained by akashmaggon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_newscategoryclassification_fullmodel_3_en_5.5.0_3.0_1727059857892.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_newscategoryclassification_fullmodel_3_en_5.5.0_3.0_1727059857892.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_base_uncased_newscategoryclassification_fullmodel_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_base_uncased_newscategoryclassification_fullmodel_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_newscategoryclassification_fullmodel_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/akashmaggon/bert-base-uncased-newscategoryclassification-fullmodel-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_368items_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_368items_pipeline_en.md new file mode 100644 index 00000000000000..30bee3b6e7fd3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_368items_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_368items_pipeline pipeline BertForSequenceClassification from luminar9 +author: John Snow Labs +name: bert_finetuned_368items_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_368items_pipeline` is a English model originally trained by luminar9. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_368items_pipeline_en_5.5.0_3.0_1727095959274.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_368items_pipeline_en_5.5.0_3.0_1727095959274.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_368items_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_368items_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_368items_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/luminar9/bert-finetuned-368items + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_squad_vidyuth_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_squad_vidyuth_en.md new file mode 100644 index 00000000000000..8ae40bdac26111 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_squad_vidyuth_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_finetuned_squad_vidyuth BertForQuestionAnswering from Vidyuth +author: John Snow Labs +name: bert_finetuned_squad_vidyuth +date: 2024-09-23 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_squad_vidyuth` is a English model originally trained by Vidyuth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_vidyuth_en_5.5.0_3.0_1727128425852.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_vidyuth_en_5.5.0_3.0_1727128425852.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_squad_vidyuth","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_finetuned_squad_vidyuth", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_squad_vidyuth| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Vidyuth/bert-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_squad_vidyuth_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_squad_vidyuth_pipeline_en.md new file mode 100644 index 00000000000000..f902c675806357 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_finetuned_squad_vidyuth_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_finetuned_squad_vidyuth_pipeline pipeline BertForQuestionAnswering from Vidyuth +author: John Snow Labs +name: bert_finetuned_squad_vidyuth_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_squad_vidyuth_pipeline` is a English model originally trained by Vidyuth. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_vidyuth_pipeline_en_5.5.0_3.0_1727128486255.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_vidyuth_pipeline_en_5.5.0_3.0_1727128486255.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_squad_vidyuth_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_squad_vidyuth_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_squad_vidyuth_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Vidyuth/bert-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_large_cased_finetuned_conll03_english_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_large_cased_finetuned_conll03_english_finetuned_en.md new file mode 100644 index 00000000000000..23ebba56c13db0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_large_cased_finetuned_conll03_english_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_large_cased_finetuned_conll03_english_finetuned BertForTokenClassification from alban12 +author: John Snow Labs +name: bert_large_cased_finetuned_conll03_english_finetuned +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_cased_finetuned_conll03_english_finetuned` is a English model originally trained by alban12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_cased_finetuned_conll03_english_finetuned_en_5.5.0_3.0_1727111378582.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_cased_finetuned_conll03_english_finetuned_en_5.5.0_3.0_1727111378582.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_large_cased_finetuned_conll03_english_finetuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_large_cased_finetuned_conll03_english_finetuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_cased_finetuned_conll03_english_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/alban12/bert-large-cased-finetuned-conll03-english-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_large_cased_finetuned_conll03_english_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bert_large_cased_finetuned_conll03_english_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..07bbb335805175 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_large_cased_finetuned_conll03_english_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_large_cased_finetuned_conll03_english_finetuned_pipeline pipeline BertForTokenClassification from alban12 +author: John Snow Labs +name: bert_large_cased_finetuned_conll03_english_finetuned_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_cased_finetuned_conll03_english_finetuned_pipeline` is a English model originally trained by alban12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_cased_finetuned_conll03_english_finetuned_pipeline_en_5.5.0_3.0_1727111437539.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_cased_finetuned_conll03_english_finetuned_pipeline_en_5.5.0_3.0_1727111437539.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_cased_finetuned_conll03_english_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_cased_finetuned_conll03_english_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_cased_finetuned_conll03_english_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/alban12/bert-large-cased-finetuned-conll03-english-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bert_massa_es.md b/docs/_posts/ahmedlone127/2024-09-23-bert_massa_es.md new file mode 100644 index 00000000000000..00f7756686ad37 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bert_massa_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bert_massa XlmRoBertaForSequenceClassification from nmarinnn +author: John Snow Labs +name: bert_massa +date: 2024-09-23 +tags: [es, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_massa` is a Castilian, Spanish model originally trained by nmarinnn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_massa_es_5.5.0_3.0_1727126108155.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_massa_es_5.5.0_3.0_1727126108155.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("bert_massa","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("bert_massa", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_massa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|es| +|Size:|1.0 GB| + +## References + +https://huggingface.co/nmarinnn/bert-massa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bloom_question_classification_en.md b/docs/_posts/ahmedlone127/2024-09-23-bloom_question_classification_en.md new file mode 100644 index 00000000000000..87bb4eb0b09876 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bloom_question_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bloom_question_classification DistilBertForSequenceClassification from MinervaBotTeam +author: John Snow Labs +name: bloom_question_classification +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bloom_question_classification` is a English model originally trained by MinervaBotTeam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bloom_question_classification_en_5.5.0_3.0_1727108414330.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bloom_question_classification_en_5.5.0_3.0_1727108414330.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bloom_question_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bloom_question_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bloom_question_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MinervaBotTeam/Bloom_Question_Classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bloom_question_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-bloom_question_classification_pipeline_en.md new file mode 100644 index 00000000000000..9224ed0319bfa4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bloom_question_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bloom_question_classification_pipeline pipeline DistilBertForSequenceClassification from MinervaBotTeam +author: John Snow Labs +name: bloom_question_classification_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bloom_question_classification_pipeline` is a English model originally trained by MinervaBotTeam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bloom_question_classification_pipeline_en_5.5.0_3.0_1727108426660.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bloom_question_classification_pipeline_en_5.5.0_3.0_1727108426660.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bloom_question_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bloom_question_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bloom_question_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/MinervaBotTeam/Bloom_Question_Classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bpe_selfies_pubchem_shard00_70k_en.md b/docs/_posts/ahmedlone127/2024-09-23-bpe_selfies_pubchem_shard00_70k_en.md new file mode 100644 index 00000000000000..ab6e09206c6693 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bpe_selfies_pubchem_shard00_70k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bpe_selfies_pubchem_shard00_70k RoBertaEmbeddings from seyonec +author: John Snow Labs +name: bpe_selfies_pubchem_shard00_70k +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bpe_selfies_pubchem_shard00_70k` is a English model originally trained by seyonec. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bpe_selfies_pubchem_shard00_70k_en_5.5.0_3.0_1727092327447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bpe_selfies_pubchem_shard00_70k_en_5.5.0_3.0_1727092327447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("bpe_selfies_pubchem_shard00_70k","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("bpe_selfies_pubchem_shard00_70k","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bpe_selfies_pubchem_shard00_70k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|309.5 MB| + +## References + +https://huggingface.co/seyonec/BPE_SELFIES_PubChem_shard00_70k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_en.md b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_en.md new file mode 100644 index 00000000000000..52368f14768672 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bsc_bio_ehr_spanish_symptemist_word2vec_85_ner RoBertaForTokenClassification from Rodrigo1771 +author: John Snow Labs +name: bsc_bio_ehr_spanish_symptemist_word2vec_85_ner +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_symptemist_word2vec_85_ner` is a English model originally trained by Rodrigo1771. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_en_5.5.0_3.0_1727115179187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_word2vec_85_ner_en_5.5.0_3.0_1727115179187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_symptemist_word2vec_85_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_symptemist_word2vec_85_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_symptemist_word2vec_85_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|435.0 MB| + +## References + +https://huggingface.co/Rodrigo1771/bsc-bio-ehr-es-symptemist-word2vec-85-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_abishines_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_abishines_pipeline_en.md new file mode 100644 index 00000000000000..d1ea3b977cdcfc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_eli5_mlm_model_abishines_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_abishines_pipeline pipeline RoBertaEmbeddings from abishines +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_abishines_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_abishines_pipeline` is a English model originally trained by abishines. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_abishines_pipeline_en_5.5.0_3.0_1727091996203.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_abishines_pipeline_en_5.5.0_3.0_1727091996203.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_abishines_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_abishines_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_abishines_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/abishines/my_awesome_eli5_mlm_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_beto_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_beto_en.md new file mode 100644 index 00000000000000..3000f21026bc2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_beto_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_beto BertForSequenceClassification from maic1995 +author: John Snow Labs +name: burmese_awesome_model_beto +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_beto` is a English model originally trained by maic1995. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_beto_en_5.5.0_3.0_1727095427947.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_beto_en_5.5.0_3.0_1727095427947.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("burmese_awesome_model_beto","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("burmese_awesome_model_beto", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_beto| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|411.9 MB| + +## References + +https://huggingface.co/maic1995/my_awesome_model_beto \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_eyeonyou_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_eyeonyou_pipeline_en.md new file mode 100644 index 00000000000000..ed64509c203950 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_eyeonyou_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_eyeonyou_pipeline pipeline DistilBertForSequenceClassification from eyeonyou +author: John Snow Labs +name: burmese_awesome_model_eyeonyou_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_eyeonyou_pipeline` is a English model originally trained by eyeonyou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_eyeonyou_pipeline_en_5.5.0_3.0_1727110514298.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_eyeonyou_pipeline_en_5.5.0_3.0_1727110514298.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_eyeonyou_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_eyeonyou_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_eyeonyou_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/eyeonyou/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_habiba_2227_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_habiba_2227_pipeline_en.md new file mode 100644 index 00000000000000..ecca5f6cd3a031 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_habiba_2227_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_habiba_2227_pipeline pipeline DistilBertForSequenceClassification from habiba-2227 +author: John Snow Labs +name: burmese_awesome_model_habiba_2227_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_habiba_2227_pipeline` is a English model originally trained by habiba-2227. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_habiba_2227_pipeline_en_5.5.0_3.0_1727059396976.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_habiba_2227_pipeline_en_5.5.0_3.0_1727059396976.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_habiba_2227_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_habiba_2227_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_habiba_2227_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/habiba-2227/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_julianorosco37_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_julianorosco37_pipeline_en.md new file mode 100644 index 00000000000000..b75a2d3c8d4658 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_julianorosco37_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_julianorosco37_pipeline pipeline DistilBertForSequenceClassification from Julianorosco37 +author: John Snow Labs +name: burmese_awesome_model_julianorosco37_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_julianorosco37_pipeline` is a English model originally trained by Julianorosco37. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_julianorosco37_pipeline_en_5.5.0_3.0_1727082364112.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_julianorosco37_pipeline_en_5.5.0_3.0_1727082364112.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_julianorosco37_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_julianorosco37_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_julianorosco37_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Julianorosco37/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_tsibbett_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_tsibbett_pipeline_en.md new file mode 100644 index 00000000000000..3fa715b01a4ba8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_tsibbett_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_tsibbett_pipeline pipeline DistilBertForSequenceClassification from tsibbett +author: John Snow Labs +name: burmese_awesome_model_tsibbett_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_tsibbett_pipeline` is a English model originally trained by tsibbett. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_tsibbett_pipeline_en_5.5.0_3.0_1727082302505.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_tsibbett_pipeline_en_5.5.0_3.0_1727082302505.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_tsibbett_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_tsibbett_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_tsibbett_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tsibbett/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_zeckhardt_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_zeckhardt_en.md new file mode 100644 index 00000000000000..4a4f6206972f20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_zeckhardt_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_zeckhardt DistilBertForSequenceClassification from zeckhardt +author: John Snow Labs +name: burmese_awesome_model_zeckhardt +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_zeckhardt` is a English model originally trained by zeckhardt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_zeckhardt_en_5.5.0_3.0_1727097140210.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_zeckhardt_en_5.5.0_3.0_1727097140210.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_zeckhardt","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_zeckhardt", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_zeckhardt| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/zeckhardt/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_zeckhardt_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_zeckhardt_pipeline_en.md new file mode 100644 index 00000000000000..5dac3fc85f87b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_model_zeckhardt_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_zeckhardt_pipeline pipeline DistilBertForSequenceClassification from zeckhardt +author: John Snow Labs +name: burmese_awesome_model_zeckhardt_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_zeckhardt_pipeline` is a English model originally trained by zeckhardt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_zeckhardt_pipeline_en_5.5.0_3.0_1727097152156.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_zeckhardt_pipeline_en_5.5.0_3.0_1727097152156.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_zeckhardt_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_zeckhardt_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_zeckhardt_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/zeckhardt/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_qa_model_dennischan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_qa_model_dennischan_pipeline_en.md new file mode 100644 index 00000000000000..80bca63d652869 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_awesome_qa_model_dennischan_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_awesome_qa_model_dennischan_pipeline pipeline BertForQuestionAnswering from dennischan +author: John Snow Labs +name: burmese_awesome_qa_model_dennischan_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_qa_model_dennischan_pipeline` is a English model originally trained by dennischan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_dennischan_pipeline_en_5.5.0_3.0_1727049952851.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_qa_model_dennischan_pipeline_en_5.5.0_3.0_1727049952851.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_qa_model_dennischan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_qa_model_dennischan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_qa_model_dennischan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/dennischan/my_awesome_qa_model + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-burmese_textclassification_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-burmese_textclassification_model_pipeline_en.md new file mode 100644 index 00000000000000..71fa91ebce29de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-burmese_textclassification_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_textclassification_model_pipeline pipeline DistilBertForSequenceClassification from Happpy0413 +author: John Snow Labs +name: burmese_textclassification_model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_textclassification_model_pipeline` is a English model originally trained by Happpy0413. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_textclassification_model_pipeline_en_5.5.0_3.0_1727059133075.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_textclassification_model_pipeline_en_5.5.0_3.0_1727059133075.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_textclassification_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_textclassification_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_textclassification_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Happpy0413/my_textclassification_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-classifier_tensorride_en.md b/docs/_posts/ahmedlone127/2024-09-23-classifier_tensorride_en.md new file mode 100644 index 00000000000000..d1a87f5ae52af6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-classifier_tensorride_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English classifier_tensorride DistilBertForSequenceClassification from Tensorride +author: John Snow Labs +name: classifier_tensorride +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classifier_tensorride` is a English model originally trained by Tensorride. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classifier_tensorride_en_5.5.0_3.0_1727059238148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classifier_tensorride_en_5.5.0_3.0_1727059238148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("classifier_tensorride","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("classifier_tensorride", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classifier_tensorride| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Tensorride/Classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr23_seed1_en.md b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr23_seed1_en.md new file mode 100644 index 00000000000000..ec937274739976 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr23_seed1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cold_fusion_itr23_seed1 RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr23_seed1 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr23_seed1` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr23_seed1_en_5.5.0_3.0_1727135590809.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr23_seed1_en_5.5.0_3.0_1727135590809.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr23_seed1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("cold_fusion_itr23_seed1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr23_seed1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr23-seed1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr25_seed4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr25_seed4_pipeline_en.md new file mode 100644 index 00000000000000..ab4bb3d28ce5e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cold_fusion_itr25_seed4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cold_fusion_itr25_seed4_pipeline pipeline RoBertaForSequenceClassification from ibm +author: John Snow Labs +name: cold_fusion_itr25_seed4_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cold_fusion_itr25_seed4_pipeline` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cold_fusion_itr25_seed4_pipeline_en_5.5.0_3.0_1727134714843.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cold_fusion_itr25_seed4_pipeline_en_5.5.0_3.0_1727134714843.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cold_fusion_itr25_seed4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cold_fusion_itr25_seed4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cold_fusion_itr25_seed4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.0 MB| + +## References + +https://huggingface.co/ibm/ColD-Fusion-itr25-seed4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_pipeline_en.md new file mode 100644 index 00000000000000..d318073dd50d8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-covid_roberta_25_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English covid_roberta_25_pipeline pipeline RoBertaEmbeddings from timoneda +author: John Snow Labs +name: covid_roberta_25_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`covid_roberta_25_pipeline` is a English model originally trained by timoneda. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/covid_roberta_25_pipeline_en_5.5.0_3.0_1727092114506.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/covid_roberta_25_pipeline_en_5.5.0_3.0_1727092114506.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("covid_roberta_25_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("covid_roberta_25_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|covid_roberta_25_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/timoneda/covid_roberta_25 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cyber_distilbert_en.md b/docs/_posts/ahmedlone127/2024-09-23-cyber_distilbert_en.md new file mode 100644 index 00000000000000..7735149dd1bbb3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cyber_distilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cyber_distilbert DistilBertForSequenceClassification from eysharaazia +author: John Snow Labs +name: cyber_distilbert +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cyber_distilbert` is a English model originally trained by eysharaazia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cyber_distilbert_en_5.5.0_3.0_1727093590628.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cyber_distilbert_en_5.5.0_3.0_1727093590628.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("cyber_distilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("cyber_distilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cyber_distilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/eysharaazia/cyber_distilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-cyber_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-cyber_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..70dfd6b077c85f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-cyber_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cyber_distilbert_pipeline pipeline DistilBertForSequenceClassification from eysharaazia +author: John Snow Labs +name: cyber_distilbert_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cyber_distilbert_pipeline` is a English model originally trained by eysharaazia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cyber_distilbert_pipeline_en_5.5.0_3.0_1727093615342.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cyber_distilbert_pipeline_en_5.5.0_3.0_1727093615342.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cyber_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cyber_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cyber_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/eysharaazia/cyber_distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-db_mc_9_2_en.md b/docs/_posts/ahmedlone127/2024-09-23-db_mc_9_2_en.md new file mode 100644 index 00000000000000..d44a7337fc25b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-db_mc_9_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English db_mc_9_2 DistilBertForSequenceClassification from exala +author: John Snow Labs +name: db_mc_9_2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`db_mc_9_2` is a English model originally trained by exala. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/db_mc_9_2_en_5.5.0_3.0_1727059224175.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/db_mc_9_2_en_5.5.0_3.0_1727059224175.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("db_mc_9_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("db_mc_9_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|db_mc_9_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/exala/db_mc_9.2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-denilsenaxel_xlm_roberta_finetuned_language_detection_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-denilsenaxel_xlm_roberta_finetuned_language_detection_pipeline_en.md new file mode 100644 index 00000000000000..788e12bfb28ad2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-denilsenaxel_xlm_roberta_finetuned_language_detection_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English denilsenaxel_xlm_roberta_finetuned_language_detection_pipeline pipeline XlmRoBertaForSequenceClassification from DenilsenAxel +author: John Snow Labs +name: denilsenaxel_xlm_roberta_finetuned_language_detection_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`denilsenaxel_xlm_roberta_finetuned_language_detection_pipeline` is a English model originally trained by DenilsenAxel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/denilsenaxel_xlm_roberta_finetuned_language_detection_pipeline_en_5.5.0_3.0_1727126484919.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/denilsenaxel_xlm_roberta_finetuned_language_detection_pipeline_en_5.5.0_3.0_1727126484919.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("denilsenaxel_xlm_roberta_finetuned_language_detection_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("denilsenaxel_xlm_roberta_finetuned_language_detection_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|denilsenaxel_xlm_roberta_finetuned_language_detection_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|792.3 MB| + +## References + +https://huggingface.co/DenilsenAxel/denilsenaxel-xlm-roberta-finetuned-language-detection + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-detector_god2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-detector_god2_pipeline_en.md new file mode 100644 index 00000000000000..0a75f67960d0f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-detector_god2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English detector_god2_pipeline pipeline XlmRoBertaForSequenceClassification from Sydelabs +author: John Snow Labs +name: detector_god2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`detector_god2_pipeline` is a English model originally trained by Sydelabs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/detector_god2_pipeline_en_5.5.0_3.0_1727088314880.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/detector_god2_pipeline_en_5.5.0_3.0_1727088314880.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("detector_god2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("detector_god2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|detector_god2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.1 GB| + +## References + +https://huggingface.co/Sydelabs/detector_god2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding80model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding80model_pipeline_en.md new file mode 100644 index 00000000000000..cf7e734733bb92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_agnews_padding80model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_agnews_padding80model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_agnews_padding80model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_agnews_padding80model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding80model_pipeline_en_5.5.0_3.0_1727059846344.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_agnews_padding80model_pipeline_en_5.5.0_3.0_1727059846344.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_agnews_padding80model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_agnews_padding80model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_agnews_padding80model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_agnews_padding80model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_cased_hatespeech_ft_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_cased_hatespeech_ft_pipeline_en.md new file mode 100644 index 00000000000000..f72c9b878ba9fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_cased_hatespeech_ft_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_cased_hatespeech_ft_pipeline pipeline DistilBertForSequenceClassification from EgehanEralp +author: John Snow Labs +name: distilbert_base_cased_hatespeech_ft_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_cased_hatespeech_ft_pipeline` is a English model originally trained by EgehanEralp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_hatespeech_ft_pipeline_en_5.5.0_3.0_1727082648897.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_hatespeech_ft_pipeline_en_5.5.0_3.0_1727082648897.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_cased_hatespeech_ft_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_cased_hatespeech_ft_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased_hatespeech_ft_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/EgehanEralp/distilbert-base-cased-hatespeech-ft + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_finetuned_chnsenticorp_chinese_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_finetuned_chnsenticorp_chinese_pipeline_zh.md new file mode 100644 index 00000000000000..2374f3392b0c4e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_finetuned_chnsenticorp_chinese_pipeline_zh.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Chinese distilbert_base_finetuned_chnsenticorp_chinese_pipeline pipeline DistilBertForSequenceClassification from WangA +author: John Snow Labs +name: distilbert_base_finetuned_chnsenticorp_chinese_pipeline +date: 2024-09-23 +tags: [zh, open_source, pipeline, onnx] +task: Text Classification +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_finetuned_chnsenticorp_chinese_pipeline` is a Chinese model originally trained by WangA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_finetuned_chnsenticorp_chinese_pipeline_zh_5.5.0_3.0_1727082583587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_finetuned_chnsenticorp_chinese_pipeline_zh_5.5.0_3.0_1727082583587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_finetuned_chnsenticorp_chinese_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_finetuned_chnsenticorp_chinese_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_finetuned_chnsenticorp_chinese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|507.6 MB| + +## References + +https://huggingface.co/WangA/distilbert-base-finetuned-chnsenticorp-chinese + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_finetuned_chnsenticorp_chinese_zh.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_finetuned_chnsenticorp_chinese_zh.md new file mode 100644 index 00000000000000..1cb801a579c718 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_finetuned_chnsenticorp_chinese_zh.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Chinese distilbert_base_finetuned_chnsenticorp_chinese DistilBertForSequenceClassification from WangA +author: John Snow Labs +name: distilbert_base_finetuned_chnsenticorp_chinese +date: 2024-09-23 +tags: [zh, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_finetuned_chnsenticorp_chinese` is a Chinese model originally trained by WangA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_finetuned_chnsenticorp_chinese_zh_5.5.0_3.0_1727082557787.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_finetuned_chnsenticorp_chinese_zh_5.5.0_3.0_1727082557787.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_finetuned_chnsenticorp_chinese","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_finetuned_chnsenticorp_chinese", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_finetuned_chnsenticorp_chinese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|zh| +|Size:|507.6 MB| + +## References + +https://huggingface.co/WangA/distilbert-base-finetuned-chnsenticorp-chinese \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_3epoch10_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_3epoch10_en.md new file mode 100644 index 00000000000000..27403cf5ac2a61 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_3epoch10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_3epoch10 DistilBertForSequenceClassification from dianamihalache27 +author: John Snow Labs +name: distilbert_base_uncased_3epoch10 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_3epoch10` is a English model originally trained by dianamihalache27. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_3epoch10_en_5.5.0_3.0_1727093918373.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_3epoch10_en_5.5.0_3.0_1727093918373.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_3epoch10","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_3epoch10", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_3epoch10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dianamihalache27/distilbert-base-uncased_3epoch10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_pipeline_en.md new file mode 100644 index 00000000000000..0058f98fe2d32b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_pipeline pipeline DistilBertForSequenceClassification from Sohaibsoussi +author: John Snow Labs +name: distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_pipeline` is a English model originally trained by Sohaibsoussi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_pipeline_en_5.5.0_3.0_1727087228194.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_pipeline_en_5.5.0_3.0_1727087228194.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_distilled_finetuned_clinc_sohaibsoussi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/Sohaibsoussi/distilbert-base-uncased-distilled-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cc_pipeline_en.md new file mode 100644 index 00000000000000..04ee9ae6c91306 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cc_pipeline pipeline DistilBertForSequenceClassification from gtalibov +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cc_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cc_pipeline` is a English model originally trained by gtalibov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cc_pipeline_en_5.5.0_3.0_1727093727942.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cc_pipeline_en_5.5.0_3.0_1727093727942.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/gtalibov/distilbert-base-uncased-finetuned-CC + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_dodiaz2111_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_dodiaz2111_pipeline_en.md new file mode 100644 index 00000000000000..28f663d12cafb2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_dodiaz2111_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_dodiaz2111_pipeline pipeline DistilBertForSequenceClassification from dodiaz2111 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_dodiaz2111_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_dodiaz2111_pipeline` is a English model originally trained by dodiaz2111. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_dodiaz2111_pipeline_en_5.5.0_3.0_1727074044565.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_dodiaz2111_pipeline_en_5.5.0_3.0_1727074044565.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_dodiaz2111_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_dodiaz2111_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_dodiaz2111_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/dodiaz2111/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_hrayrm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_hrayrm_pipeline_en.md new file mode 100644 index 00000000000000..94dfc3a7a48bf0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_hrayrm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_hrayrm_pipeline pipeline DistilBertForSequenceClassification from HrayrM +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_hrayrm_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_hrayrm_pipeline` is a English model originally trained by HrayrM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_hrayrm_pipeline_en_5.5.0_3.0_1727110401610.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_hrayrm_pipeline_en_5.5.0_3.0_1727110401610.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_hrayrm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_hrayrm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_hrayrm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/HrayrM/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_joacorf33_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_joacorf33_en.md new file mode 100644 index 00000000000000..2183dd664e4e91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_clinc_joacorf33_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_joacorf33 DistilBertForSequenceClassification from joacorf33 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_joacorf33 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_joacorf33` is a English model originally trained by joacorf33. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_joacorf33_en_5.5.0_3.0_1727093600577.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_joacorf33_en_5.5.0_3.0_1727093600577.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_joacorf33","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_joacorf33", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_joacorf33| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/joacorf33/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_garyseventeen_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_garyseventeen_en.md new file mode 100644 index 00000000000000..d07aa04b64287f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_garyseventeen_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_garyseventeen DistilBertForSequenceClassification from Garyseventeen +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_garyseventeen +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_garyseventeen` is a English model originally trained by Garyseventeen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_garyseventeen_en_5.5.0_3.0_1727110523624.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_garyseventeen_en_5.5.0_3.0_1727110523624.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_garyseventeen","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_garyseventeen", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_garyseventeen| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Garyseventeen/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_garyseventeen_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_garyseventeen_pipeline_en.md new file mode 100644 index 00000000000000..35ff5f44405fa1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_garyseventeen_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_garyseventeen_pipeline pipeline DistilBertForSequenceClassification from Garyseventeen +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_garyseventeen_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_garyseventeen_pipeline` is a English model originally trained by Garyseventeen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_garyseventeen_pipeline_en_5.5.0_3.0_1727110535749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_garyseventeen_pipeline_en_5.5.0_3.0_1727110535749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_garyseventeen_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_garyseventeen_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_garyseventeen_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Garyseventeen/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_poodja_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_poodja_en.md new file mode 100644 index 00000000000000..f129e8804a0036 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_poodja_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_poodja DistilBertForSequenceClassification from Poodja +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_poodja +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_poodja` is a English model originally trained by Poodja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_poodja_en_5.5.0_3.0_1727108543457.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_poodja_en_5.5.0_3.0_1727108543457.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_poodja","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_poodja", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_poodja| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Poodja/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_sjoerdvink_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_sjoerdvink_en.md new file mode 100644 index 00000000000000..4a5752d8029e01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_sjoerdvink_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_sjoerdvink DistilBertForSequenceClassification from sjoerdvink +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_sjoerdvink +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_sjoerdvink` is a English model originally trained by sjoerdvink. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_sjoerdvink_en_5.5.0_3.0_1727082297367.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_sjoerdvink_en_5.5.0_3.0_1727082297367.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_sjoerdvink","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_sjoerdvink", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_sjoerdvink| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sjoerdvink/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_zeid_hazboun_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_zeid_hazboun_en.md new file mode 100644 index 00000000000000..f2f5c9f3df3292 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_cola_zeid_hazboun_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_zeid_hazboun DistilBertForSequenceClassification from Zeid-Hazboun +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_zeid_hazboun +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_zeid_hazboun` is a English model originally trained by Zeid-Hazboun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_zeid_hazboun_en_5.5.0_3.0_1727108397380.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_zeid_hazboun_en_5.5.0_3.0_1727108397380.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_zeid_hazboun","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_zeid_hazboun", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_zeid_hazboun| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Zeid-Hazboun/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_conceptos_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_conceptos_pipeline_en.md new file mode 100644 index 00000000000000..6c73e9786efdaf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_conceptos_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_conceptos_pipeline pipeline DistilBertForSequenceClassification from jcesquivel +author: John Snow Labs +name: distilbert_base_uncased_finetuned_conceptos_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_conceptos_pipeline` is a English model originally trained by jcesquivel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_conceptos_pipeline_en_5.5.0_3.0_1727087125822.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_conceptos_pipeline_en_5.5.0_3.0_1727087125822.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_conceptos_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_conceptos_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_conceptos_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/jcesquivel/distilbert-base-uncased-finetuned-conceptos + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_depression_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_depression_en.md new file mode 100644 index 00000000000000..b2b7d5ed223a67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_depression_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_depression DistilBertForSequenceClassification from welsachy +author: John Snow Labs +name: distilbert_base_uncased_finetuned_depression +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_depression` is a English model originally trained by welsachy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_depression_en_5.5.0_3.0_1727108588595.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_depression_en_5.5.0_3.0_1727108588595.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_depression","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_depression", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_depression| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/welsachy/distilbert-base-uncased-finetuned-depression \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_disaster_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_disaster_pipeline_en.md new file mode 100644 index 00000000000000..dd5d4ef12078ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_disaster_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_disaster_pipeline pipeline DistilBertForSequenceClassification from RaiRachit +author: John Snow Labs +name: distilbert_base_uncased_finetuned_disaster_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_disaster_pipeline` is a English model originally trained by RaiRachit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_disaster_pipeline_en_5.5.0_3.0_1727108529116.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_disaster_pipeline_en_5.5.0_3.0_1727108529116.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_disaster_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_disaster_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_disaster_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/RaiRachit/distilbert-base-uncased-finetuned-disaster + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_2hab_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_2hab_pipeline_en.md new file mode 100644 index 00000000000000..d048816bddf34b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_2hab_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_2hab_pipeline pipeline DistilBertForSequenceClassification from 2hab +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_2hab_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_2hab_pipeline` is a English model originally trained by 2hab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_2hab_pipeline_en_5.5.0_3.0_1727110408334.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_2hab_pipeline_en_5.5.0_3.0_1727110408334.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_2hab_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_2hab_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_2hab_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/2hab/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adelineli_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adelineli_en.md new file mode 100644 index 00000000000000..f57b7816679dac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adelineli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_adelineli DistilBertForSequenceClassification from adelineli +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_adelineli +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_adelineli` is a English model originally trained by adelineli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_adelineli_en_5.5.0_3.0_1727108384522.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_adelineli_en_5.5.0_3.0_1727108384522.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_adelineli","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_adelineli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_adelineli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/adelineli/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adelineli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adelineli_pipeline_en.md new file mode 100644 index 00000000000000..8381d977289ca2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_adelineli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_adelineli_pipeline pipeline DistilBertForSequenceClassification from adelineli +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_adelineli_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_adelineli_pipeline` is a English model originally trained by adelineli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_adelineli_pipeline_en_5.5.0_3.0_1727108396360.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_adelineli_pipeline_en_5.5.0_3.0_1727108396360.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_adelineli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_adelineli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_adelineli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/adelineli/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cogsci13_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cogsci13_pipeline_en.md new file mode 100644 index 00000000000000..832865b0f7fb73 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_cogsci13_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_cogsci13_pipeline pipeline DistilBertForSequenceClassification from cogsci13 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_cogsci13_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_cogsci13_pipeline` is a English model originally trained by cogsci13. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cogsci13_pipeline_en_5.5.0_3.0_1727082522178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_cogsci13_pipeline_en_5.5.0_3.0_1727082522178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_cogsci13_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_cogsci13_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_cogsci13_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/cogsci13/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_devs0n_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_devs0n_pipeline_en.md new file mode 100644 index 00000000000000..fe1a00a3e51f32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_devs0n_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_devs0n_pipeline pipeline DistilBertForSequenceClassification from devs0n +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_devs0n_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_devs0n_pipeline` is a English model originally trained by devs0n. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_devs0n_pipeline_en_5.5.0_3.0_1727059485004.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_devs0n_pipeline_en_5.5.0_3.0_1727059485004.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_devs0n_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_devs0n_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_devs0n_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/devs0n/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jachs182_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jachs182_en.md new file mode 100644 index 00000000000000..b9475af22bb757 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jachs182_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jachs182 DistilBertForSequenceClassification from jachs182 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jachs182 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jachs182` is a English model originally trained by jachs182. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jachs182_en_5.5.0_3.0_1727059762812.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jachs182_en_5.5.0_3.0_1727059762812.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jachs182","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jachs182", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jachs182| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jachs182/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jrsky_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jrsky_en.md new file mode 100644 index 00000000000000..a3352d304dce1a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jrsky_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jrsky DistilBertForSequenceClassification from jrsky +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jrsky +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jrsky` is a English model originally trained by jrsky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jrsky_en_5.5.0_3.0_1727073517647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jrsky_en_5.5.0_3.0_1727073517647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jrsky","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_jrsky", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jrsky| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jrsky/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jrsky_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jrsky_pipeline_en.md new file mode 100644 index 00000000000000..c37d128a3db7ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_jrsky_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_jrsky_pipeline pipeline DistilBertForSequenceClassification from jrsky +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_jrsky_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_jrsky_pipeline` is a English model originally trained by jrsky. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jrsky_pipeline_en_5.5.0_3.0_1727073536479.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_jrsky_pipeline_en_5.5.0_3.0_1727073536479.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jrsky_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_jrsky_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_jrsky_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jrsky/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_mikhab_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_mikhab_pipeline_en.md new file mode 100644 index 00000000000000..0532d821d39cbf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_mikhab_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_mikhab_pipeline pipeline DistilBertForSequenceClassification from mikhab +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_mikhab_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_mikhab_pipeline` is a English model originally trained by mikhab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_mikhab_pipeline_en_5.5.0_3.0_1727059492056.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_mikhab_pipeline_en_5.5.0_3.0_1727059492056.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_mikhab_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_mikhab_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_mikhab_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mikhab/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_trsekhar_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_trsekhar_pipeline_en.md new file mode 100644 index 00000000000000..81fbb07fe72c2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_trsekhar_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_trsekhar_pipeline pipeline DistilBertForSequenceClassification from trsekhar +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_trsekhar_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_trsekhar_pipeline` is a English model originally trained by trsekhar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_trsekhar_pipeline_en_5.5.0_3.0_1727073969465.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_trsekhar_pipeline_en_5.5.0_3.0_1727073969465.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_trsekhar_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_trsekhar_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_trsekhar_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/trsekhar/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_zcstarr_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_zcstarr_en.md new file mode 100644 index 00000000000000..280e9d9fa157bb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_emotion_zcstarr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_zcstarr DistilBertForSequenceClassification from zcstarr +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_zcstarr +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_zcstarr` is a English model originally trained by zcstarr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_zcstarr_en_5.5.0_3.0_1727059325176.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_zcstarr_en_5.5.0_3.0_1727059325176.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_zcstarr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_zcstarr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_zcstarr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/zcstarr/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_intro2_verizon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_intro2_verizon_pipeline_en.md new file mode 100644 index 00000000000000..f9bfb2a0ea0729 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finetuned_intro2_verizon_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_intro2_verizon_pipeline pipeline DistilBertForSequenceClassification from TieIncred +author: John Snow Labs +name: distilbert_base_uncased_finetuned_intro2_verizon_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_intro2_verizon_pipeline` is a English model originally trained by TieIncred. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_intro2_verizon_pipeline_en_5.5.0_3.0_1727086987155.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_intro2_verizon_pipeline_en_5.5.0_3.0_1727086987155.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_intro2_verizon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_intro2_verizon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_intro2_verizon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/TieIncred/distilbert-base-uncased-finetuned-intro2-verizon + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finteuned_emotion_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finteuned_emotion_en.md new file mode 100644 index 00000000000000..3ba4e7b67e8ee5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_finteuned_emotion_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finteuned_emotion DistilBertForSequenceClassification from sknera +author: John Snow Labs +name: distilbert_base_uncased_finteuned_emotion +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finteuned_emotion` is a English model originally trained by sknera. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finteuned_emotion_en_5.5.0_3.0_1727074062122.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finteuned_emotion_en_5.5.0_3.0_1727074062122.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finteuned_emotion","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finteuned_emotion", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finteuned_emotion| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sknera/distilbert-base-uncased-finteuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_imdb_dfurman_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_imdb_dfurman_pipeline_en.md new file mode 100644 index 00000000000000..4916b1f1223c02 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_imdb_dfurman_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_imdb_dfurman_pipeline pipeline DistilBertForSequenceClassification from dfurman +author: John Snow Labs +name: distilbert_base_uncased_imdb_dfurman_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_imdb_dfurman_pipeline` is a English model originally trained by dfurman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_imdb_dfurman_pipeline_en_5.5.0_3.0_1727086937837.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_imdb_dfurman_pipeline_en_5.5.0_3.0_1727086937837.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_imdb_dfurman_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_imdb_dfurman_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_imdb_dfurman_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/dfurman/distilbert-base-uncased-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_lora_text_classification_lincgr_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_lora_text_classification_lincgr_en.md new file mode 100644 index 00000000000000..6e201509c0dbf4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_lora_text_classification_lincgr_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_lora_text_classification_lincgr DistilBertForSequenceClassification from lincgr +author: John Snow Labs +name: distilbert_base_uncased_lora_text_classification_lincgr +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_lora_text_classification_lincgr` is a English model originally trained by lincgr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_lora_text_classification_lincgr_en_5.5.0_3.0_1727059557405.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_lora_text_classification_lincgr_en_5.5.0_3.0_1727059557405.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_lora_text_classification_lincgr","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_lora_text_classification_lincgr", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_lora_text_classification_lincgr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lincgr/distilbert-base-uncased-lora-text-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut12ut1_plprefix0stlarge_simsp100_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut12ut1_plprefix0stlarge_simsp100_en.md new file mode 100644 index 00000000000000..d325a7b0d9935a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut12ut1_plprefix0stlarge_simsp100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st16sd_ut12ut1_plprefix0stlarge_simsp100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st16sd_ut12ut1_plprefix0stlarge_simsp100 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st16sd_ut12ut1_plprefix0stlarge_simsp100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut12ut1_plprefix0stlarge_simsp100_en_5.5.0_3.0_1727097377994.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut12ut1_plprefix0stlarge_simsp100_en_5.5.0_3.0_1727097377994.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st16sd_ut12ut1_plprefix0stlarge_simsp100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st16sd_ut12ut1_plprefix0stlarge_simsp100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st16sd_ut12ut1_plprefix0stlarge_simsp100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st16sd_ut12ut1_PLPrefix0stlarge_simsp100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md new file mode 100644 index 00000000000000..c4c7f1e21c8b67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en_5.5.0_3.0_1727108569798.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_pipeline_en_5.5.0_3.0_1727108569798.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st16sd_ut72ut1largepfxnf_simsp300_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st16sd_ut72ut1largePfxNf_simsp300_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline_en.md new file mode 100644 index 00000000000000..461d86c16b5cf7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline_en_5.5.0_3.0_1727110627225.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline_en_5.5.0_3.0_1727110627225.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st16sd_ut72ut3_plprefix0stlarge_simsp100_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st16sd_ut72ut3_PLPrefix0stlarge_simsp100_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st21sd_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st21sd_en.md new file mode 100644 index 00000000000000..bc2665ea6ebcb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st21sd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st21sd DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st21sd +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st21sd` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st21sd_en_5.5.0_3.0_1727094104025.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st21sd_en_5.5.0_3.0_1727094104025.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st21sd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st21sd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st21sd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st21sd \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en.md new file mode 100644 index 00000000000000..fc2cb577a1b7b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en_5.5.0_3.0_1727093579982.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline_en_5.5.0_3.0_1727093579982.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st2sd_ut72ut5_plprefix0stlarge_simsp100_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st2sd_ut72ut5_PLPrefix0stlarge_simsp100_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en.md new file mode 100644 index 00000000000000..94218329e654f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean100 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean100 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean100` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en_5.5.0_3.0_1727082750419.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean100_en_5.5.0_3.0_1727082750419.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean100","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean100", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean100| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st6sd_ut72ut5_PLPrefix0stlarge_simsp100_clean100 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md new file mode 100644 index 00000000000000..2b6733a6b7acd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200 DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en_5.5.0_3.0_1727073730231.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_en_5.5.0_3.0_1727073730231.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st6sd_ut72ut5_PLPrefix0stlarge_simsp100_clean200 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en.md new file mode 100644 index 00000000000000..3dc04416cf68f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en_5.5.0_3.0_1727073742331.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline_en_5.5.0_3.0_1727073742331.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st6sd_ut72ut5_plprefix0stlarge_simsp100_clean200_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st6sd_ut72ut5_PLPrefix0stlarge_simsp100_clean200 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut102ut1_plain_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut102ut1_plain_simsp_en.md new file mode 100644 index 00000000000000..fa8df53934ca63 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_base_uncased_travel_zphr_0st_ut102ut1_plain_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_travel_zphr_0st_ut102ut1_plain_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_travel_zphr_0st_ut102ut1_plain_simsp +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_travel_zphr_0st_ut102ut1_plain_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut102ut1_plain_simsp_en_5.5.0_3.0_1727096947443.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_travel_zphr_0st_ut102ut1_plain_simsp_en_5.5.0_3.0_1727096947443.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut102ut1_plain_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_travel_zphr_0st_ut102ut1_plain_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_travel_zphr_0st_ut102ut1_plain_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_travel_zphr_0st_ut102ut1_plain_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_fine_turned_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_fine_turned_classification_pipeline_en.md new file mode 100644 index 00000000000000..765fc7f70b7f3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_fine_turned_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_fine_turned_classification_pipeline pipeline DistilBertForSequenceClassification from abhimanyuaryan +author: John Snow Labs +name: distilbert_fine_turned_classification_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_fine_turned_classification_pipeline` is a English model originally trained by abhimanyuaryan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_fine_turned_classification_pipeline_en_5.5.0_3.0_1727110516132.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_fine_turned_classification_pipeline_en_5.5.0_3.0_1727110516132.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_fine_turned_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_fine_turned_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_fine_turned_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abhimanyuaryan/distilbert-fine-turned-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_foundation_category_funders_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_foundation_category_funders_pipeline_en.md new file mode 100644 index 00000000000000..dd16125c80ed5d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_foundation_category_funders_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_foundation_category_funders_pipeline pipeline DistilBertForSequenceClassification from eric-mc2 +author: John Snow Labs +name: distilbert_foundation_category_funders_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_foundation_category_funders_pipeline` is a English model originally trained by eric-mc2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_foundation_category_funders_pipeline_en_5.5.0_3.0_1727108682112.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_foundation_category_funders_pipeline_en_5.5.0_3.0_1727108682112.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_foundation_category_funders_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_foundation_category_funders_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_foundation_category_funders_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/eric-mc2/distilbert-foundation-category-funders + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_ft_sst5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_ft_sst5_pipeline_en.md new file mode 100644 index 00000000000000..1ce753585a7eb6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_ft_sst5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_ft_sst5_pipeline pipeline DistilBertForSequenceClassification from pablo-chocobar +author: John Snow Labs +name: distilbert_ft_sst5_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_ft_sst5_pipeline` is a English model originally trained by pablo-chocobar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_ft_sst5_pipeline_en_5.5.0_3.0_1727097146414.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_ft_sst5_pipeline_en_5.5.0_3.0_1727097146414.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_ft_sst5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_ft_sst5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_ft_sst5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/pablo-chocobar/distilbert-ft-sst5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_imdb_decentmakeover13_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_imdb_decentmakeover13_en.md new file mode 100644 index 00000000000000..6661323b0cf8e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_imdb_decentmakeover13_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_imdb_decentmakeover13 DistilBertForSequenceClassification from decentmakeover13 +author: John Snow Labs +name: distilbert_imdb_decentmakeover13 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_decentmakeover13` is a English model originally trained by decentmakeover13. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_decentmakeover13_en_5.5.0_3.0_1727082283749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_decentmakeover13_en_5.5.0_3.0_1727082283749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_decentmakeover13","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_decentmakeover13", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_decentmakeover13| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/decentmakeover13/distilbert-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_nbx_all_l_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_nbx_all_l_pipeline_en.md new file mode 100644 index 00000000000000..4023867c09ee48 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_nbx_all_l_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_nbx_all_l_pipeline pipeline DistilBertForSequenceClassification from vishnuhaasan +author: John Snow Labs +name: distilbert_nbx_all_l_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_nbx_all_l_pipeline` is a English model originally trained by vishnuhaasan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_nbx_all_l_pipeline_en_5.5.0_3.0_1727073535793.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_nbx_all_l_pipeline_en_5.5.0_3.0_1727073535793.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_nbx_all_l_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_nbx_all_l_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_nbx_all_l_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/vishnuhaasan/distilbert_nbx_all_l + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_new2_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_new2_en.md new file mode 100644 index 00000000000000..7e6648c7a27b71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_new2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_new2 DistilBertForSequenceClassification from wnic00 +author: John Snow Labs +name: distilbert_new2 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_new2` is a English model originally trained by wnic00. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_new2_en_5.5.0_3.0_1727082733068.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_new2_en_5.5.0_3.0_1727082733068.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_new2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_new2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_new2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/wnic00/distilbert-new2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_new2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_new2_pipeline_en.md new file mode 100644 index 00000000000000..f9ead7b84cd5a1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_new2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_new2_pipeline pipeline DistilBertForSequenceClassification from wnic00 +author: John Snow Labs +name: distilbert_new2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_new2_pipeline` is a English model originally trained by wnic00. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_new2_pipeline_en_5.5.0_3.0_1727082744930.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_new2_pipeline_en_5.5.0_3.0_1727082744930.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_new2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_new2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_new2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/wnic00/distilbert-new2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_256_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_256_en.md new file mode 100644 index 00000000000000..db76f79a149c19 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_256_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_256 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_256 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_256` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_256_en_5.5.0_3.0_1727108475119.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_256_en_5.5.0_3.0_1727108475119.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_256","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_256", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_data_aug_mrpc_256| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|71.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_data_aug_mrpc_256 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_en.md new file mode 100644 index 00000000000000..18de5c490d0c27 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_en_5.5.0_3.0_1727097080143.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384_en_5.5.0_3.0_1727097080143.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_qnli_384| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|111.9 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_qnli_384 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_sentiment_test_2023dec_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sentiment_test_2023dec_pipeline_en.md new file mode 100644 index 00000000000000..072bb340235362 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_sentiment_test_2023dec_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sentiment_test_2023dec_pipeline pipeline DistilBertForSequenceClassification from FungSung +author: John Snow Labs +name: distilbert_sentiment_test_2023dec_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sentiment_test_2023dec_pipeline` is a English model originally trained by FungSung. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sentiment_test_2023dec_pipeline_en_5.5.0_3.0_1727108699800.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sentiment_test_2023dec_pipeline_en_5.5.0_3.0_1727108699800.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sentiment_test_2023dec_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sentiment_test_2023dec_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sentiment_test_2023dec_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/FungSung/distilBert_sentiment_test_2023DEC + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-distilbert_twitterfin_padding70model_en.md b/docs/_posts/ahmedlone127/2024-09-23-distilbert_twitterfin_padding70model_en.md new file mode 100644 index 00000000000000..2712701fb1312c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-distilbert_twitterfin_padding70model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_twitterfin_padding70model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_twitterfin_padding70model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_twitterfin_padding70model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding70model_en_5.5.0_3.0_1727059744593.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_twitterfin_padding70model_en_5.5.0_3.0_1727059744593.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_twitterfin_padding70model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_twitterfin_padding70model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_twitterfin_padding70model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_twitterfin_padding70model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-environmentalbert_forest_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-environmentalbert_forest_pipeline_en.md new file mode 100644 index 00000000000000..9db6a3b81f56d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-environmentalbert_forest_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English environmentalbert_forest_pipeline pipeline RoBertaForSequenceClassification from ESGBERT +author: John Snow Labs +name: environmentalbert_forest_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`environmentalbert_forest_pipeline` is a English model originally trained by ESGBERT. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/environmentalbert_forest_pipeline_en_5.5.0_3.0_1727135088453.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/environmentalbert_forest_pipeline_en_5.5.0_3.0_1727135088453.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("environmentalbert_forest_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("environmentalbert_forest_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|environmentalbert_forest_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/ESGBERT/EnvironmentalBERT-forest + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-fake_news_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-fake_news_pipeline_en.md new file mode 100644 index 00000000000000..786ca13f902edf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-fake_news_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fake_news_pipeline pipeline DistilBertForSequenceClassification from nlp-godfathers +author: John Snow Labs +name: fake_news_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fake_news_pipeline` is a English model originally trained by nlp-godfathers. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fake_news_pipeline_en_5.5.0_3.0_1727082229446.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fake_news_pipeline_en_5.5.0_3.0_1727082229446.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fake_news_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fake_news_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fake_news_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.3 MB| + +## References + +https://huggingface.co/nlp-godfathers/fake_news + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-fine_tuned_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-fine_tuned_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..2050ec673e31d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-fine_tuned_roberta_base_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English fine_tuned_roberta_base_pipeline pipeline BertForQuestionAnswering from kiwakwok +author: John Snow Labs +name: fine_tuned_roberta_base_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_roberta_base_pipeline` is a English model originally trained by kiwakwok. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_roberta_base_pipeline_en_5.5.0_3.0_1727106733404.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_roberta_base_pipeline_en_5.5.0_3.0_1727106733404.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fine_tuned_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fine_tuned_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|382.3 MB| + +## References + +https://huggingface.co/kiwakwok/fine-tuned-roberta-base + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-fine_tuning_en.md b/docs/_posts/ahmedlone127/2024-09-23-fine_tuning_en.md new file mode 100644 index 00000000000000..c175317a51d20f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-fine_tuning_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fine_tuning DistilBertForSequenceClassification from StevensRV93 +author: John Snow Labs +name: fine_tuning +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuning` is a English model originally trained by StevensRV93. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuning_en_5.5.0_3.0_1727093913733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuning_en_5.5.0_3.0_1727093913733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("fine_tuning","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("fine_tuning", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuning| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/StevensRV93/Fine_tuning \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_pipeline_en.md new file mode 100644 index 00000000000000..801c516e3ea8ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_pipeline pipeline RoBertaForTokenClassification from manucos +author: John Snow Labs +name: finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_pipeline` is a English model originally trained by manucos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_pipeline_en_5.5.0_3.0_1727081628091.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_pipeline_en_5.5.0_3.0_1727081628091.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned__roberta_clinical_wl_spanish__augmented_ultrasounds_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|469.8 MB| + +## References + +https://huggingface.co/manucos/finetuned__roberta-clinical-wl-es__augmented-ultrasounds-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_imsoumyaneel_25k_epoch_10_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_imsoumyaneel_25k_epoch_10_en.md new file mode 100644 index 00000000000000..83dc09fdfb9fc6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_imsoumyaneel_25k_epoch_10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_model_imsoumyaneel_25k_epoch_10 DistilBertForSequenceClassification from rahulgaikwad007 +author: John Snow Labs +name: finetuned_model_imsoumyaneel_25k_epoch_10 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_model_imsoumyaneel_25k_epoch_10` is a English model originally trained by rahulgaikwad007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_model_imsoumyaneel_25k_epoch_10_en_5.5.0_3.0_1727086875704.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_model_imsoumyaneel_25k_epoch_10_en_5.5.0_3.0_1727086875704.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_model_imsoumyaneel_25k_epoch_10","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuned_model_imsoumyaneel_25k_epoch_10", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_model_imsoumyaneel_25k_epoch_10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rahulgaikwad007/Finetuned-model-imsoumyaneel-25k-Epoch-10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_on_4k_samples_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_on_4k_samples_pipeline_en.md new file mode 100644 index 00000000000000..06a3f249375fa8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuned_model_on_4k_samples_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_model_on_4k_samples_pipeline pipeline DistilBertForSequenceClassification from Wolverine001 +author: John Snow Labs +name: finetuned_model_on_4k_samples_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_model_on_4k_samples_pipeline` is a English model originally trained by Wolverine001. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_model_on_4k_samples_pipeline_en_5.5.0_3.0_1727059659123.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_model_on_4k_samples_pipeline_en_5.5.0_3.0_1727059659123.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_model_on_4k_samples_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_model_on_4k_samples_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_model_on_4k_samples_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Wolverine001/finetuned_model_on-4k-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ducpham1501_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ducpham1501_pipeline_en.md new file mode 100644 index 00000000000000..a9746674632a4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ducpham1501_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ducpham1501_pipeline pipeline DistilBertForSequenceClassification from DucPham1501 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ducpham1501_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ducpham1501_pipeline` is a English model originally trained by DucPham1501. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ducpham1501_pipeline_en_5.5.0_3.0_1727087293159.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ducpham1501_pipeline_en_5.5.0_3.0_1727087293159.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_ducpham1501_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_ducpham1501_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ducpham1501_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DucPham1501/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_h9v8_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_h9v8_en.md new file mode 100644 index 00000000000000..d25435add01229 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_h9v8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_h9v8 DistilBertForSequenceClassification from H9V8 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_h9v8 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_h9v8` is a English model originally trained by H9V8. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_h9v8_en_5.5.0_3.0_1727097239450.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_h9v8_en_5.5.0_3.0_1727097239450.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_h9v8","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_h9v8", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_h9v8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/H9V8/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ih8l1ght_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ih8l1ght_en.md new file mode 100644 index 00000000000000..443932da4ccede --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_ih8l1ght_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_ih8l1ght DistilBertForSequenceClassification from ih8l1ght +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_ih8l1ght +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_ih8l1ght` is a English model originally trained by ih8l1ght. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ih8l1ght_en_5.5.0_3.0_1727094027763.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_ih8l1ght_en_5.5.0_3.0_1727094027763.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ih8l1ght","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_ih8l1ght", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_ih8l1ght| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ih8l1ght/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_lusinep_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_lusinep_en.md new file mode 100644 index 00000000000000..f25796d7b2c6dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_lusinep_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_lusinep DistilBertForSequenceClassification from lusinep +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_lusinep +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_lusinep` is a English model originally trained by lusinep. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_lusinep_en_5.5.0_3.0_1727059771336.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_lusinep_en_5.5.0_3.0_1727059771336.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_lusinep","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_samples_lusinep", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_lusinep| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lusinep/finetuning-sentiment-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_lusinep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_lusinep_pipeline_en.md new file mode 100644 index 00000000000000..288482b39730c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_lusinep_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_lusinep_pipeline pipeline DistilBertForSequenceClassification from lusinep +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_lusinep_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_lusinep_pipeline` is a English model originally trained by lusinep. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_lusinep_pipeline_en_5.5.0_3.0_1727059785312.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_lusinep_pipeline_en_5.5.0_3.0_1727059785312.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_lusinep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_lusinep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_lusinep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lusinep/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_murali07_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_murali07_pipeline_en.md new file mode 100644 index 00000000000000..4c79ce580369f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_murali07_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_murali07_pipeline pipeline DistilBertForSequenceClassification from murali07 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_murali07_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_murali07_pipeline` is a English model originally trained by murali07. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_murali07_pipeline_en_5.5.0_3.0_1727108485534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_murali07_pipeline_en_5.5.0_3.0_1727108485534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_murali07_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_murali07_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_murali07_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/murali07/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_nandyala12_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_nandyala12_pipeline_en.md new file mode 100644 index 00000000000000..8a86eba06d4e88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_3000_samples_nandyala12_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_nandyala12_pipeline pipeline DistilBertForSequenceClassification from Nandyala12 +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_nandyala12_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_nandyala12_pipeline` is a English model originally trained by Nandyala12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_nandyala12_pipeline_en_5.5.0_3.0_1727097369985.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_nandyala12_pipeline_en_5.5.0_3.0_1727097369985.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_nandyala12_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_nandyala12_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_nandyala12_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Nandyala12/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_5000_amazon_cm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_5000_amazon_cm_pipeline_en.md new file mode 100644 index 00000000000000..9a5bcf401c82a3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_5000_amazon_cm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_5000_amazon_cm_pipeline pipeline DistilBertForSequenceClassification from abyesses +author: John Snow Labs +name: finetuning_sentiment_model_5000_amazon_cm_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_5000_amazon_cm_pipeline` is a English model originally trained by abyesses. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazon_cm_pipeline_en_5.5.0_3.0_1727096962720.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_5000_amazon_cm_pipeline_en_5.5.0_3.0_1727096962720.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_5000_amazon_cm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_5000_amazon_cm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_5000_amazon_cm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/abyesses/finetuning-sentiment-model-5000-amazon_cm + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_lomve_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_lomve_en.md new file mode 100644 index 00000000000000..b98abf3fbf261e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_lomve_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_lomve DistilBertForSequenceClassification from lomve +author: John Snow Labs +name: finetuning_sentiment_model_lomve +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_lomve` is a English model originally trained by lomve. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_lomve_en_5.5.0_3.0_1727082406112.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_lomve_en_5.5.0_3.0_1727082406112.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_lomve","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_lomve", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_lomve| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lomve/finetuning-sentiment-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_lomve_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_lomve_pipeline_en.md new file mode 100644 index 00000000000000..509e015a9dad1c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-finetuning_sentiment_model_lomve_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_lomve_pipeline pipeline DistilBertForSequenceClassification from lomve +author: John Snow Labs +name: finetuning_sentiment_model_lomve_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_lomve_pipeline` is a English model originally trained by lomve. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_lomve_pipeline_en_5.5.0_3.0_1727082420725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_lomve_pipeline_en_5.5.0_3.0_1727082420725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_lomve_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_lomve_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_lomve_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lomve/finetuning-sentiment-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-happy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-happy_pipeline_en.md new file mode 100644 index 00000000000000..9f3343de126ced --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-happy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English happy_pipeline pipeline RoBertaEmbeddings from MatthijsN +author: John Snow Labs +name: happy_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`happy_pipeline` is a English model originally trained by MatthijsN. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/happy_pipeline_en_5.5.0_3.0_1727057083302.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/happy_pipeline_en_5.5.0_3.0_1727057083302.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("happy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("happy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|happy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/MatthijsN/happy + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-harmful_content_trainer_en.md b/docs/_posts/ahmedlone127/2024-09-23-harmful_content_trainer_en.md new file mode 100644 index 00000000000000..1c5084101da8af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-harmful_content_trainer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English harmful_content_trainer DistilBertForSequenceClassification from AIUs3r0 +author: John Snow Labs +name: harmful_content_trainer +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`harmful_content_trainer` is a English model originally trained by AIUs3r0. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/harmful_content_trainer_en_5.5.0_3.0_1727059847466.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/harmful_content_trainer_en_5.5.0_3.0_1727059847466.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("harmful_content_trainer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("harmful_content_trainer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|harmful_content_trainer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/AIUs3r0/Harmful_Content_Trainer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-humor_recognition_greek_distilbert_el.md b/docs/_posts/ahmedlone127/2024-09-23-humor_recognition_greek_distilbert_el.md new file mode 100644 index 00000000000000..a9e93ccc0f8f35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-humor_recognition_greek_distilbert_el.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Modern Greek (1453-) humor_recognition_greek_distilbert DistilBertForSequenceClassification from Kalloniatis +author: John Snow Labs +name: humor_recognition_greek_distilbert +date: 2024-09-23 +tags: [el, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: el +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`humor_recognition_greek_distilbert` is a Modern Greek (1453-) model originally trained by Kalloniatis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/humor_recognition_greek_distilbert_el_5.5.0_3.0_1727074177409.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/humor_recognition_greek_distilbert_el_5.5.0_3.0_1727074177409.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("humor_recognition_greek_distilbert","el") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("humor_recognition_greek_distilbert", "el") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|humor_recognition_greek_distilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|el| +|Size:|507.6 MB| + +## References + +https://huggingface.co/Kalloniatis/Humor-Recognition-Greek-DistilBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hw01_chchang_en.md b/docs/_posts/ahmedlone127/2024-09-23-hw01_chchang_en.md new file mode 100644 index 00000000000000..928a4b1a93716f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hw01_chchang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hw01_chchang DistilBertForSequenceClassification from CHChang +author: John Snow Labs +name: hw01_chchang +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hw01_chchang` is a English model originally trained by CHChang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hw01_chchang_en_5.5.0_3.0_1727093797638.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hw01_chchang_en_5.5.0_3.0_1727093797638.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("hw01_chchang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("hw01_chchang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hw01_chchang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/CHChang/HW01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hw1223_01_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-hw1223_01_pipeline_en.md new file mode 100644 index 00000000000000..4496732edd9a25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hw1223_01_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hw1223_01_pipeline pipeline DistilBertForSequenceClassification from tunyu +author: John Snow Labs +name: hw1223_01_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hw1223_01_pipeline` is a English model originally trained by tunyu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hw1223_01_pipeline_en_5.5.0_3.0_1727059554935.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hw1223_01_pipeline_en_5.5.0_3.0_1727059554935.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hw1223_01_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hw1223_01_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hw1223_01_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/tunyu/HW1223_01 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-hw_1_irisliou_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-hw_1_irisliou_pipeline_en.md new file mode 100644 index 00000000000000..2b7f9feb9472d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-hw_1_irisliou_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hw_1_irisliou_pipeline pipeline DistilBertForSequenceClassification from IrisLiou +author: John Snow Labs +name: hw_1_irisliou_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hw_1_irisliou_pipeline` is a English model originally trained by IrisLiou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hw_1_irisliou_pipeline_en_5.5.0_3.0_1727093832450.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hw_1_irisliou_pipeline_en_5.5.0_3.0_1727093832450.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hw_1_irisliou_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hw_1_irisliou_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hw_1_irisliou_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/IrisLiou/hw-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-id2223_lab2_whisper_nelanbu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-id2223_lab2_whisper_nelanbu_pipeline_en.md new file mode 100644 index 00000000000000..042a3977ca5835 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-id2223_lab2_whisper_nelanbu_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English id2223_lab2_whisper_nelanbu_pipeline pipeline WhisperForCTC from nelanbu +author: John Snow Labs +name: id2223_lab2_whisper_nelanbu_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`id2223_lab2_whisper_nelanbu_pipeline` is a English model originally trained by nelanbu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/id2223_lab2_whisper_nelanbu_pipeline_en_5.5.0_3.0_1727052076788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/id2223_lab2_whisper_nelanbu_pipeline_en_5.5.0_3.0_1727052076788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("id2223_lab2_whisper_nelanbu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("id2223_lab2_whisper_nelanbu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|id2223_lab2_whisper_nelanbu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/nelanbu/ID2223_Lab2_Whisper + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-inisw08_robert_mlm_adamw_torch_bs8_en.md b/docs/_posts/ahmedlone127/2024-09-23-inisw08_robert_mlm_adamw_torch_bs8_en.md new file mode 100644 index 00000000000000..24b77883261731 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-inisw08_robert_mlm_adamw_torch_bs8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English inisw08_robert_mlm_adamw_torch_bs8 RoBertaEmbeddings from ugiugi +author: John Snow Labs +name: inisw08_robert_mlm_adamw_torch_bs8 +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`inisw08_robert_mlm_adamw_torch_bs8` is a English model originally trained by ugiugi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/inisw08_robert_mlm_adamw_torch_bs8_en_5.5.0_3.0_1727066103391.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/inisw08_robert_mlm_adamw_torch_bs8_en_5.5.0_3.0_1727066103391.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("inisw08_robert_mlm_adamw_torch_bs8","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("inisw08_robert_mlm_adamw_torch_bs8","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|inisw08_robert_mlm_adamw_torch_bs8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|465.6 MB| + +## References + +https://huggingface.co/ugiugi/inisw08-RoBERT-mlm-adamw_torch_bs8 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-kanglish_offensive_language_identification_en.md b/docs/_posts/ahmedlone127/2024-09-23-kanglish_offensive_language_identification_en.md new file mode 100644 index 00000000000000..fe81565abc1430 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-kanglish_offensive_language_identification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English kanglish_offensive_language_identification RoBertaForSequenceClassification from seanbenhur +author: John Snow Labs +name: kanglish_offensive_language_identification +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kanglish_offensive_language_identification` is a English model originally trained by seanbenhur. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kanglish_offensive_language_identification_en_5.5.0_3.0_1727134915581.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kanglish_offensive_language_identification_en_5.5.0_3.0_1727134915581.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("kanglish_offensive_language_identification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("kanglish_offensive_language_identification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kanglish_offensive_language_identification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|451.8 MB| + +## References + +https://huggingface.co/seanbenhur/kanglish-offensive-language-identification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-lab1_yanwen9969_en.md b/docs/_posts/ahmedlone127/2024-09-23-lab1_yanwen9969_en.md new file mode 100644 index 00000000000000..6eb0b09b70278f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-lab1_yanwen9969_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English lab1_yanwen9969 DistilBertForSequenceClassification from Yanwen9969 +author: John Snow Labs +name: lab1_yanwen9969 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab1_yanwen9969` is a English model originally trained by Yanwen9969. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab1_yanwen9969_en_5.5.0_3.0_1727108287546.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab1_yanwen9969_en_5.5.0_3.0_1727108287546.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("lab1_yanwen9969","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("lab1_yanwen9969", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab1_yanwen9969| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Yanwen9969/Lab1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-lab1_yanwen9969_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-lab1_yanwen9969_pipeline_en.md new file mode 100644 index 00000000000000..7c0f44568f52c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-lab1_yanwen9969_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English lab1_yanwen9969_pipeline pipeline DistilBertForSequenceClassification from Yanwen9969 +author: John Snow Labs +name: lab1_yanwen9969_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab1_yanwen9969_pipeline` is a English model originally trained by Yanwen9969. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab1_yanwen9969_pipeline_en_5.5.0_3.0_1727108304479.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab1_yanwen9969_pipeline_en_5.5.0_3.0_1727108304479.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("lab1_yanwen9969_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("lab1_yanwen9969_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab1_yanwen9969_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Yanwen9969/Lab1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-lab_11_distilbert_sentiment_en.md b/docs/_posts/ahmedlone127/2024-09-23-lab_11_distilbert_sentiment_en.md new file mode 100644 index 00000000000000..4f88a4d6884ff4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-lab_11_distilbert_sentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English lab_11_distilbert_sentiment DistilBertForSequenceClassification from Malecc +author: John Snow Labs +name: lab_11_distilbert_sentiment +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lab_11_distilbert_sentiment` is a English model originally trained by Malecc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lab_11_distilbert_sentiment_en_5.5.0_3.0_1727097178410.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lab_11_distilbert_sentiment_en_5.5.0_3.0_1727097178410.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("lab_11_distilbert_sentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("lab_11_distilbert_sentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lab_11_distilbert_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Malecc/lab_11_distilbert_sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-len_pruned_30_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-len_pruned_30_model_pipeline_en.md new file mode 100644 index 00000000000000..e8b150caff080f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-len_pruned_30_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English len_pruned_30_model_pipeline pipeline DistilBertForSequenceClassification from andygoh5 +author: John Snow Labs +name: len_pruned_30_model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`len_pruned_30_model_pipeline` is a English model originally trained by andygoh5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/len_pruned_30_model_pipeline_en_5.5.0_3.0_1727093724827.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/len_pruned_30_model_pipeline_en_5.5.0_3.0_1727093724827.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("len_pruned_30_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("len_pruned_30_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|len_pruned_30_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/andygoh5/len-pruned-30-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-liar_fake_news_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-23-liar_fake_news_roberta_base_en.md new file mode 100644 index 00000000000000..53f2758d1321c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-liar_fake_news_roberta_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English liar_fake_news_roberta_base RoBertaEmbeddings from Jawaher +author: John Snow Labs +name: liar_fake_news_roberta_base +date: 2024-09-23 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`liar_fake_news_roberta_base` is a English model originally trained by Jawaher. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/liar_fake_news_roberta_base_en_5.5.0_3.0_1727092161072.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/liar_fake_news_roberta_base_en_5.5.0_3.0_1727092161072.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("liar_fake_news_roberta_base","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("liar_fake_news_roberta_base","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|liar_fake_news_roberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/Jawaher/LIAR-fake-news-roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-melodea_final_model_en.md b/docs/_posts/ahmedlone127/2024-09-23-melodea_final_model_en.md new file mode 100644 index 00000000000000..11c6216466a90a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-melodea_final_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English melodea_final_model RoBertaForSequenceClassification from GabiRayman +author: John Snow Labs +name: melodea_final_model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`melodea_final_model` is a English model originally trained by GabiRayman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/melodea_final_model_en_5.5.0_3.0_1727055072815.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/melodea_final_model_en_5.5.0_3.0_1727055072815.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("melodea_final_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("melodea_final_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|melodea_final_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|429.4 MB| + +## References + +https://huggingface.co/GabiRayman/melodea_final-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-missingbertmodelfinal1_en.md b/docs/_posts/ahmedlone127/2024-09-23-missingbertmodelfinal1_en.md new file mode 100644 index 00000000000000..874bfa0ef3f053 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-missingbertmodelfinal1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English missingbertmodelfinal1 DistilBertForSequenceClassification from sachit56 +author: John Snow Labs +name: missingbertmodelfinal1 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`missingbertmodelfinal1` is a English model originally trained by sachit56. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/missingbertmodelfinal1_en_5.5.0_3.0_1727059660384.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/missingbertmodelfinal1_en_5.5.0_3.0_1727059660384.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("missingbertmodelfinal1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("missingbertmodelfinal1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|missingbertmodelfinal1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sachit56/missingbertmodelfinal1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-missingbertmodelfinal1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-missingbertmodelfinal1_pipeline_en.md new file mode 100644 index 00000000000000..0f5ab07f52bb52 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-missingbertmodelfinal1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English missingbertmodelfinal1_pipeline pipeline DistilBertForSequenceClassification from sachit56 +author: John Snow Labs +name: missingbertmodelfinal1_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`missingbertmodelfinal1_pipeline` is a English model originally trained by sachit56. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/missingbertmodelfinal1_pipeline_en_5.5.0_3.0_1727059673068.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/missingbertmodelfinal1_pipeline_en_5.5.0_3.0_1727059673068.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("missingbertmodelfinal1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("missingbertmodelfinal1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|missingbertmodelfinal1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sachit56/missingbertmodelfinal1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-mt5_base_visp_s2_en.md b/docs/_posts/ahmedlone127/2024-09-23-mt5_base_visp_s2_en.md new file mode 100644 index 00000000000000..4709d8a62ad48f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-mt5_base_visp_s2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English mt5_base_visp_s2 T5Transformer from ngwgsang +author: John Snow Labs +name: mt5_base_visp_s2 +date: 2024-09-23 +tags: [en, open_source, onnx, t5, question_answering, summarization, translation, text_generation] +task: [Question Answering, Summarization, Translation, Text Generation] +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: T5Transformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained T5Transformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mt5_base_visp_s2` is a English model originally trained by ngwgsang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mt5_base_visp_s2_en_5.5.0_3.0_1727068944585.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mt5_base_visp_s2_en_5.5.0_3.0_1727068944585.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +t5 = T5Transformer.pretrained("mt5_base_visp_s2","en") \ + .setInputCols(["document"]) \ + .setOutputCol("output") + +pipeline = Pipeline().setStages([documentAssembler, t5]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val t5 = T5Transformer.pretrained("mt5_base_visp_s2", "en") + .setInputCols(Array("documents")) + .setOutputCol("output") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, t5)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mt5_base_visp_s2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[output]| +|Language:|en| +|Size:|2.3 GB| + +## References + +https://huggingface.co/ngwgsang/mt5-base-visp-s2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-muril_large_chaii_hi.md b/docs/_posts/ahmedlone127/2024-09-23-muril_large_chaii_hi.md new file mode 100644 index 00000000000000..969550450a428b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-muril_large_chaii_hi.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Hindi muril_large_chaii BertForQuestionAnswering from abhishek +author: John Snow Labs +name: muril_large_chaii +date: 2024-09-23 +tags: [hi, open_source, onnx, question_answering, bert] +task: Question Answering +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`muril_large_chaii` is a Hindi model originally trained by abhishek. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/muril_large_chaii_hi_5.5.0_3.0_1727070766823.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/muril_large_chaii_hi_5.5.0_3.0_1727070766823.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("muril_large_chaii","hi") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("muril_large_chaii", "hi") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|muril_large_chaii| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|hi| +|Size:|1.9 GB| + +## References + +https://huggingface.co/abhishek/muril-large-chaii \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding20model_en.md b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding20model_en.md new file mode 100644 index 00000000000000..2f08870b8c9710 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-n_distilbert_sst5_padding20model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_distilbert_sst5_padding20model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: n_distilbert_sst5_padding20model +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_distilbert_sst5_padding20model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding20model_en_5.5.0_3.0_1727110752226.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_distilbert_sst5_padding20model_en_5.5.0_3.0_1727110752226.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst5_padding20model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("n_distilbert_sst5_padding20model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_distilbert_sst5_padding20model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/N_distilbert_sst5_padding20model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_model_vierpiet_en.md b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_model_vierpiet_en.md new file mode 100644 index 00000000000000..a6b6583a1e7eb3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-nepal_bhasa_model_vierpiet_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nepal_bhasa_model_vierpiet DistilBertForSequenceClassification from vierpiet +author: John Snow Labs +name: nepal_bhasa_model_vierpiet +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nepal_bhasa_model_vierpiet` is a English model originally trained by vierpiet. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nepal_bhasa_model_vierpiet_en_5.5.0_3.0_1727082665415.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nepal_bhasa_model_vierpiet_en_5.5.0_3.0_1727082665415.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("nepal_bhasa_model_vierpiet","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("nepal_bhasa_model_vierpiet", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nepal_bhasa_model_vierpiet| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/vierpiet/new_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-ner_model_nathali99_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-ner_model_nathali99_pipeline_en.md new file mode 100644 index 00000000000000..bc8974cf1fdcbd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-ner_model_nathali99_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ner_model_nathali99_pipeline pipeline BertForTokenClassification from Nathali99 +author: John Snow Labs +name: ner_model_nathali99_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_model_nathali99_pipeline` is a English model originally trained by Nathali99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_model_nathali99_pipeline_en_5.5.0_3.0_1727129861197.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_model_nathali99_pipeline_en_5.5.0_3.0_1727129861197.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ner_model_nathali99_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ner_model_nathali99_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_model_nathali99_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Nathali99/ner-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-paraphrase_russian_crossencoder_mminilmv2_l12_h384_v1_en.md b/docs/_posts/ahmedlone127/2024-09-23-paraphrase_russian_crossencoder_mminilmv2_l12_h384_v1_en.md new file mode 100644 index 00000000000000..4a744ac0b28ae7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-paraphrase_russian_crossencoder_mminilmv2_l12_h384_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English paraphrase_russian_crossencoder_mminilmv2_l12_h384_v1 XlmRoBertaForSequenceClassification from victorych22 +author: John Snow Labs +name: paraphrase_russian_crossencoder_mminilmv2_l12_h384_v1 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`paraphrase_russian_crossencoder_mminilmv2_l12_h384_v1` is a English model originally trained by victorych22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/paraphrase_russian_crossencoder_mminilmv2_l12_h384_v1_en_5.5.0_3.0_1727088308163.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/paraphrase_russian_crossencoder_mminilmv2_l12_h384_v1_en_5.5.0_3.0_1727088308163.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("paraphrase_russian_crossencoder_mminilmv2_l12_h384_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("paraphrase_russian_crossencoder_mminilmv2_l12_h384_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|paraphrase_russian_crossencoder_mminilmv2_l12_h384_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|145.7 MB| + +## References + +https://huggingface.co/victorych22/paraphrase-russian-crossencoder-mMiniLMv2-L12-H384-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-paraquantizar_en.md b/docs/_posts/ahmedlone127/2024-09-23-paraquantizar_en.md new file mode 100644 index 00000000000000..d48443f8580568 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-paraquantizar_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English paraquantizar RoBertaForSequenceClassification from Heber77 +author: John Snow Labs +name: paraquantizar +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`paraquantizar` is a English model originally trained by Heber77. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/paraquantizar_en_5.5.0_3.0_1727055575249.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/paraquantizar_en_5.5.0_3.0_1727055575249.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("paraquantizar","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("paraquantizar", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|paraquantizar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.5 MB| + +## References + +https://huggingface.co/Heber77/paraquantizar \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-platzi_distilroberta_base_glue_mrpc_eduardo_ag_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-platzi_distilroberta_base_glue_mrpc_eduardo_ag_pipeline_en.md new file mode 100644 index 00000000000000..354cdf1cc13680 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-platzi_distilroberta_base_glue_mrpc_eduardo_ag_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English platzi_distilroberta_base_glue_mrpc_eduardo_ag_pipeline pipeline RoBertaForSequenceClassification from platzi +author: John Snow Labs +name: platzi_distilroberta_base_glue_mrpc_eduardo_ag_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`platzi_distilroberta_base_glue_mrpc_eduardo_ag_pipeline` is a English model originally trained by platzi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_glue_mrpc_eduardo_ag_pipeline_en_5.5.0_3.0_1727085913302.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/platzi_distilroberta_base_glue_mrpc_eduardo_ag_pipeline_en_5.5.0_3.0_1727085913302.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("platzi_distilroberta_base_glue_mrpc_eduardo_ag_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("platzi_distilroberta_base_glue_mrpc_eduardo_ag_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|platzi_distilroberta_base_glue_mrpc_eduardo_ag_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/platzi/platzi-distilroberta-base-glue-mrpc-eduardo-ag + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-pruned_30_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-pruned_30_model_pipeline_en.md new file mode 100644 index 00000000000000..fb340c0d237fbf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-pruned_30_model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English pruned_30_model_pipeline pipeline DistilBertForSequenceClassification from andygoh5 +author: John Snow Labs +name: pruned_30_model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pruned_30_model_pipeline` is a English model originally trained by andygoh5. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pruned_30_model_pipeline_en_5.5.0_3.0_1727082419368.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pruned_30_model_pipeline_en_5.5.0_3.0_1727082419368.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pruned_30_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pruned_30_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pruned_30_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/andygoh5/pruned-30-model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-red_green_classification_v3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-red_green_classification_v3_pipeline_en.md new file mode 100644 index 00000000000000..4f66fae0c4267f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-red_green_classification_v3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English red_green_classification_v3_pipeline pipeline DistilBertForSequenceClassification from pnr-svc +author: John Snow Labs +name: red_green_classification_v3_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`red_green_classification_v3_pipeline` is a English model originally trained by pnr-svc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/red_green_classification_v3_pipeline_en_5.5.0_3.0_1727108657324.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/red_green_classification_v3_pipeline_en_5.5.0_3.0_1727108657324.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("red_green_classification_v3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("red_green_classification_v3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|red_green_classification_v3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/pnr-svc/red-green-classification-v3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_airlines_news_multi_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_airlines_news_multi_en.md new file mode 100644 index 00000000000000..a3b37aaf2f4a8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_airlines_news_multi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_airlines_news_multi RoBertaForSequenceClassification from dahe827 +author: John Snow Labs +name: roberta_base_airlines_news_multi +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_airlines_news_multi` is a English model originally trained by dahe827. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_airlines_news_multi_en_5.5.0_3.0_1727085376883.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_airlines_news_multi_en_5.5.0_3.0_1727085376883.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_airlines_news_multi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_airlines_news_multi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_airlines_news_multi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|434.1 MB| + +## References + +https://huggingface.co/dahe827/roberta-base-airlines-news-multi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_46_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_46_pipeline_en.md new file mode 100644 index 00000000000000..54cbd129f71e3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_epoch_46_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_46_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_46_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_46_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_46_pipeline_en_5.5.0_3.0_1727122276146.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_46_pipeline_en_5.5.0_3.0_1727122276146.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_46_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_46_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_46_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_46 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_sotus_v1_rile_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_sotus_v1_rile_v1_pipeline_en.md new file mode 100644 index 00000000000000..20cec9de03f6db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_sotus_v1_rile_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_sotus_v1_rile_v1_pipeline pipeline RoBertaForSequenceClassification from kghanlon +author: John Snow Labs +name: roberta_base_finetuned_sotus_v1_rile_v1_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_sotus_v1_rile_v1_pipeline` is a English model originally trained by kghanlon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_sotus_v1_rile_v1_pipeline_en_5.5.0_3.0_1727135671938.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_sotus_v1_rile_v1_pipeline_en_5.5.0_3.0_1727135671938.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_sotus_v1_rile_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_sotus_v1_rile_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_sotus_v1_rile_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.3 MB| + +## References + +https://huggingface.co/kghanlon/roberta-base-finetuned-SOTUs-v1-RILE-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_7ep_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_7ep_pipeline_en.md new file mode 100644 index 00000000000000..2b11b0a8aa1cf4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_finetuned_wallisian_manual_7ep_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_finetuned_wallisian_manual_7ep_pipeline pipeline RoBertaEmbeddings from btamm12 +author: John Snow Labs +name: roberta_base_finetuned_wallisian_manual_7ep_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_finetuned_wallisian_manual_7ep_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_7ep_pipeline_en_5.5.0_3.0_1727092303503.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_finetuned_wallisian_manual_7ep_pipeline_en_5.5.0_3.0_1727092303503.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_finetuned_wallisian_manual_7ep_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_finetuned_wallisian_manual_7ep_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_finetuned_wallisian_manual_7ep_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.0 MB| + +## References + +https://huggingface.co/btamm12/roberta-base-finetuned-wls-manual-7ep + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_go_emotions_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_go_emotions_en.md new file mode 100644 index 00000000000000..a5f56c9c243c42 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_go_emotions_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English roberta_base_go_emotions RoBertaForSequenceClassification from SamLowe +author: John Snow Labs +name: roberta_base_go_emotions +date: 2024-09-23 +tags: [roberta, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_go_emotions` is a English model originally trained by SamLowe. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_go_emotions_en_5.5.0_3.0_1727082387278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_go_emotions_en_5.5.0_3.0_1727082387278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_go_emotions","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_go_emotions","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_go_emotions| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +References + +https://huggingface.co/SamLowe/roberta-base-go_emotions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_base_snli_mtreviso_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_snli_mtreviso_pipeline_en.md new file mode 100644 index 00000000000000..db9c1c3ba239f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_base_snli_mtreviso_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_snli_mtreviso_pipeline pipeline RoBertaForSequenceClassification from mtreviso +author: John Snow Labs +name: roberta_base_snli_mtreviso_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_snli_mtreviso_pipeline` is a English model originally trained by mtreviso. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_snli_mtreviso_pipeline_en_5.5.0_3.0_1727134870752.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_snli_mtreviso_pipeline_en_5.5.0_3.0_1727134870752.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_snli_mtreviso_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_snli_mtreviso_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_snli_mtreviso_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|447.9 MB| + +## References + +https://huggingface.co/mtreviso/roberta-base-snli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_conll_epoch_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_conll_epoch_4_pipeline_en.md new file mode 100644 index 00000000000000..003b15c78fab18 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_conll_epoch_4_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_conll_epoch_4_pipeline pipeline RoBertaForTokenClassification from ICT2214Team7 +author: John Snow Labs +name: roberta_conll_epoch_4_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_conll_epoch_4_pipeline` is a English model originally trained by ICT2214Team7. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_4_pipeline_en_5.5.0_3.0_1727081480063.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_conll_epoch_4_pipeline_en_5.5.0_3.0_1727081480063.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_conll_epoch_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_conll_epoch_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_conll_epoch_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/ICT2214Team7/RoBERTa_conll_epoch_4 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_danish_task_b_100k_5_labels_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_danish_task_b_100k_5_labels_en.md new file mode 100644 index 00000000000000..b5eed69c84ee3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_finetuned_danish_task_b_100k_5_labels_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_finetuned_danish_task_b_100k_5_labels RoBertaForSequenceClassification from bitsanlp +author: John Snow Labs +name: roberta_finetuned_danish_task_b_100k_5_labels +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_finetuned_danish_task_b_100k_5_labels` is a English model originally trained by bitsanlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_finetuned_danish_task_b_100k_5_labels_en_5.5.0_3.0_1727085646890.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_finetuned_danish_task_b_100k_5_labels_en_5.5.0_3.0_1727085646890.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_finetuned_danish_task_b_100k_5_labels","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_finetuned_danish_task_b_100k_5_labels", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_finetuned_danish_task_b_100k_5_labels| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/bitsanlp/roberta-finetuned-DA-task-B-100k-5-labels \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_conv_contradiction_detector_v0_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_conv_contradiction_detector_v0_en.md new file mode 100644 index 00000000000000..6df1ed738cb00b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_conv_contradiction_detector_v0_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_conv_contradiction_detector_v0 RoBertaForSequenceClassification from ynie +author: John Snow Labs +name: roberta_large_conv_contradiction_detector_v0 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_conv_contradiction_detector_v0` is a English model originally trained by ynie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_conv_contradiction_detector_v0_en_5.5.0_3.0_1727086292184.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_conv_contradiction_detector_v0_en_5.5.0_3.0_1727086292184.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_conv_contradiction_detector_v0","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_large_conv_contradiction_detector_v0", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_conv_contradiction_detector_v0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ynie/roberta-large_conv_contradiction_detector_v0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-roberta_large_conv_contradiction_detector_v0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_conv_contradiction_detector_v0_pipeline_en.md new file mode 100644 index 00000000000000..0ba8006fdc86c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-roberta_large_conv_contradiction_detector_v0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_conv_contradiction_detector_v0_pipeline pipeline RoBertaForSequenceClassification from ynie +author: John Snow Labs +name: roberta_large_conv_contradiction_detector_v0_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_conv_contradiction_detector_v0_pipeline` is a English model originally trained by ynie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_conv_contradiction_detector_v0_pipeline_en_5.5.0_3.0_1727086363656.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_conv_contradiction_detector_v0_pipeline_en_5.5.0_3.0_1727086363656.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_conv_contradiction_detector_v0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_conv_contradiction_detector_v0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_conv_contradiction_detector_v0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ynie/roberta-large_conv_contradiction_detector_v0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sector_multilabel_climatebert_f_en.md b/docs/_posts/ahmedlone127/2024-09-23-sector_multilabel_climatebert_f_en.md new file mode 100644 index 00000000000000..3cc5011fac7648 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sector_multilabel_climatebert_f_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sector_multilabel_climatebert_f RoBertaForSequenceClassification from GIZ +author: John Snow Labs +name: sector_multilabel_climatebert_f +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sector_multilabel_climatebert_f` is a English model originally trained by GIZ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sector_multilabel_climatebert_f_en_5.5.0_3.0_1727085913596.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sector_multilabel_climatebert_f_en_5.5.0_3.0_1727085913596.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sector_multilabel_climatebert_f","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sector_multilabel_climatebert_f", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sector_multilabel_climatebert_f| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.7 MB| + +## References + +https://huggingface.co/GIZ/SECTOR-multilabel-climatebert_f \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_buddhist_sanskrit_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_buddhist_sanskrit_en.md new file mode 100644 index 00000000000000..f5f916357820f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_buddhist_sanskrit_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_buddhist_sanskrit BertSentenceEmbeddings from Matej +author: John Snow Labs +name: sent_bert_base_buddhist_sanskrit +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_buddhist_sanskrit` is a English model originally trained by Matej. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_buddhist_sanskrit_en_5.5.0_3.0_1727105457117.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_buddhist_sanskrit_en_5.5.0_3.0_1727105457117.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_buddhist_sanskrit","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_buddhist_sanskrit","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_buddhist_sanskrit| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/Matej/bert-base-buddhist-sanskrit \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_buddhist_sanskrit_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_buddhist_sanskrit_pipeline_en.md new file mode 100644 index 00000000000000..ff38966dcc8479 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_buddhist_sanskrit_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_buddhist_sanskrit_pipeline pipeline BertSentenceEmbeddings from Matej +author: John Snow Labs +name: sent_bert_base_buddhist_sanskrit_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_buddhist_sanskrit_pipeline` is a English model originally trained by Matej. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_buddhist_sanskrit_pipeline_en_5.5.0_3.0_1727105476589.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_buddhist_sanskrit_pipeline_en_5.5.0_3.0_1727105476589.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_buddhist_sanskrit_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_buddhist_sanskrit_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_buddhist_sanskrit_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.3 MB| + +## References + +https://huggingface.co/Matej/bert-base-buddhist-sanskrit + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_japanese_cased_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_japanese_cased_en.md new file mode 100644 index 00000000000000..11049c42709e03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_english_japanese_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_japanese_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_japanese_cased +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_japanese_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_japanese_cased_en_5.5.0_3.0_1727104978654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_japanese_cased_en_5.5.0_3.0_1727104978654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_japanese_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_japanese_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_japanese_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|416.3 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-ja-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_stackoverflow_comments_1m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_stackoverflow_comments_1m_pipeline_en.md new file mode 100644 index 00000000000000..37648299a88fa5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_stackoverflow_comments_1m_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_stackoverflow_comments_1m_pipeline pipeline BertSentenceEmbeddings from giganticode +author: John Snow Labs +name: sent_bert_base_stackoverflow_comments_1m_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_stackoverflow_comments_1m_pipeline` is a English model originally trained by giganticode. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_stackoverflow_comments_1m_pipeline_en_5.5.0_3.0_1727122982883.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_stackoverflow_comments_1m_pipeline_en_5.5.0_3.0_1727122982883.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_stackoverflow_comments_1m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_stackoverflow_comments_1m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_stackoverflow_comments_1m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.5 MB| + +## References + +https://huggingface.co/giganticode/bert-base-StackOverflow-comments_1M + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_danish_pipeline_da.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_danish_pipeline_da.md new file mode 100644 index 00000000000000..bf933cfb0a4583 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_danish_pipeline_da.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Danish sent_bert_base_uncased_danish_pipeline pipeline BertSentenceEmbeddings from KennethTM +author: John Snow Labs +name: sent_bert_base_uncased_danish_pipeline +date: 2024-09-23 +tags: [da, open_source, pipeline, onnx] +task: Embeddings +language: da +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_danish_pipeline` is a Danish model originally trained by KennethTM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_danish_pipeline_da_5.5.0_3.0_1727090911732.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_danish_pipeline_da_5.5.0_3.0_1727090911732.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_danish_pipeline", lang = "da") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_danish_pipeline", lang = "da") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_danish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|da| +|Size:|408.6 MB| + +## References + +https://huggingface.co/KennethTM/bert-base-uncased-danish + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_duplicate_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_duplicate_pipeline_en.md new file mode 100644 index 00000000000000..08b9729e4e7328 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_duplicate_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_duplicate_pipeline pipeline BertSentenceEmbeddings from julien-c +author: John Snow Labs +name: sent_bert_base_uncased_duplicate_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_duplicate_pipeline` is a English model originally trained by julien-c. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_duplicate_pipeline_en_5.5.0_3.0_1727105125107.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_duplicate_pipeline_en_5.5.0_3.0_1727105125107.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_duplicate_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_duplicate_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_duplicate_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/julien-c/bert-base-uncased-duplicate + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_imdb_medhabi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_imdb_medhabi_pipeline_en.md new file mode 100644 index 00000000000000..4282aa1a9a4dc2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_imdb_medhabi_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_imdb_medhabi_pipeline pipeline BertSentenceEmbeddings from medhabi +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_imdb_medhabi_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_imdb_medhabi_pipeline` is a English model originally trained by medhabi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_imdb_medhabi_pipeline_en_5.5.0_3.0_1727113952916.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_imdb_medhabi_pipeline_en_5.5.0_3.0_1727113952916.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_imdb_medhabi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_imdb_medhabi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_imdb_medhabi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/medhabi/bert-base-uncased-finetuned-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_en.md new file mode 100644 index 00000000000000..56af94080fd2d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower BertSentenceEmbeddings from btamm12 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_en_5.5.0_3.0_1727109868052.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower_en_5.5.0_3.0_1727109868052.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_wallisian_manual_2ep_lower| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/btamm12/bert-base-uncased-finetuned-wls-manual-2ep-lower \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_igory1999_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_igory1999_en.md new file mode 100644 index 00000000000000..ab1459e237d3a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_igory1999_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_igory1999 BertSentenceEmbeddings from igory1999 +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_igory1999 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_igory1999` is a English model originally trained by igory1999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_igory1999_en_5.5.0_3.0_1727105301459.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_igory1999_en_5.5.0_3.0_1727105301459.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_igory1999","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_igory1999","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_igory1999| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/igory1999/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_igory1999_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_igory1999_pipeline_en.md new file mode 100644 index 00000000000000..5659f4473bc823 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_base_uncased_issues_128_igory1999_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_igory1999_pipeline pipeline BertSentenceEmbeddings from igory1999 +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_igory1999_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_igory1999_pipeline` is a English model originally trained by igory1999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_igory1999_pipeline_en_5.5.0_3.0_1727105321183.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_igory1999_pipeline_en_5.5.0_3.0_1727105321183.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_igory1999_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_igory1999_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_igory1999_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/igory1999/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_9_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_9_pipeline_en.md new file mode 100644 index 00000000000000..bf18cb815abb3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_9_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_9_pipeline pipeline BertSentenceEmbeddings from jojoUla +author: John Snow Labs +name: sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_9_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_9_pipeline` is a English model originally trained by jojoUla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_9_pipeline_en_5.5.0_3.0_1727113541166.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_9_pipeline_en_5.5.0_3.0_1727113541166.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_9_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_9_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_cased_sigir_support_refute_norwegian_label_40_2nd_test_lr10_8_9_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/jojoUla/bert-large-cased-sigir-support-refute-no-label-40-2nd-test-LR10-8-9 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_mini_domain_adapted_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_mini_domain_adapted_imdb_en.md new file mode 100644 index 00000000000000..c6a865565b5521 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_mini_domain_adapted_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_mini_domain_adapted_imdb BertSentenceEmbeddings from rasyosef +author: John Snow Labs +name: sent_bert_mini_domain_adapted_imdb +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_mini_domain_adapted_imdb` is a English model originally trained by rasyosef. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_mini_domain_adapted_imdb_en_5.5.0_3.0_1727122774835.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_mini_domain_adapted_imdb_en_5.5.0_3.0_1727122774835.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_mini_domain_adapted_imdb","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_mini_domain_adapted_imdb","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_mini_domain_adapted_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|41.9 MB| + +## References + +https://huggingface.co/rasyosef/bert-mini-domain-adapted-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_uncased_l_10_h_512_a_8_cord19_200616_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_uncased_l_10_h_512_a_8_cord19_200616_en.md new file mode 100644 index 00000000000000..aa24732c258a47 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_uncased_l_10_h_512_a_8_cord19_200616_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_uncased_l_10_h_512_a_8_cord19_200616 BertSentenceEmbeddings from aodiniz +author: John Snow Labs +name: sent_bert_uncased_l_10_h_512_a_8_cord19_200616 +date: 2024-09-23 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_uncased_l_10_h_512_a_8_cord19_200616` is a English model originally trained by aodiniz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_uncased_l_10_h_512_a_8_cord19_200616_en_5.5.0_3.0_1727102076204.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_uncased_l_10_h_512_a_8_cord19_200616_en_5.5.0_3.0_1727102076204.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_uncased_l_10_h_512_a_8_cord19_200616","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_uncased_l_10_h_512_a_8_cord19_200616","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_uncased_l_10_h_512_a_8_cord19_200616| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|177.4 MB| + +## References + +https://huggingface.co/aodiniz/bert_uncased_L-10_H-512_A-8_cord19-200616 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bert_uncased_l_10_h_512_a_8_cord19_200616_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_uncased_l_10_h_512_a_8_cord19_200616_pipeline_en.md new file mode 100644 index 00000000000000..0073eb0aea6970 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bert_uncased_l_10_h_512_a_8_cord19_200616_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_uncased_l_10_h_512_a_8_cord19_200616_pipeline pipeline BertSentenceEmbeddings from aodiniz +author: John Snow Labs +name: sent_bert_uncased_l_10_h_512_a_8_cord19_200616_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_uncased_l_10_h_512_a_8_cord19_200616_pipeline` is a English model originally trained by aodiniz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_uncased_l_10_h_512_a_8_cord19_200616_pipeline_en_5.5.0_3.0_1727102084641.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_uncased_l_10_h_512_a_8_cord19_200616_pipeline_en_5.5.0_3.0_1727102084641.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_uncased_l_10_h_512_a_8_cord19_200616_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_uncased_l_10_h_512_a_8_cord19_200616_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_uncased_l_10_h_512_a_8_cord19_200616_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|178.0 MB| + +## References + +https://huggingface.co/aodiniz/bert_uncased_L-10_H-512_A-8_cord19-200616 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_bio_mobilebert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_bio_mobilebert_pipeline_en.md new file mode 100644 index 00000000000000..a89900a86d228b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_bio_mobilebert_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bio_mobilebert_pipeline pipeline BertSentenceEmbeddings from nlpie +author: John Snow Labs +name: sent_bio_mobilebert_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bio_mobilebert_pipeline` is a English model originally trained by nlpie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bio_mobilebert_pipeline_en_5.5.0_3.0_1727105334524.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bio_mobilebert_pipeline_en_5.5.0_3.0_1727105334524.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bio_mobilebert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bio_mobilebert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bio_mobilebert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|93.1 MB| + +## References + +https://huggingface.co/nlpie/bio-mobilebert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_dajobbert_base_uncased_da.md b/docs/_posts/ahmedlone127/2024-09-23-sent_dajobbert_base_uncased_da.md new file mode 100644 index 00000000000000..2241cae753e33b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_dajobbert_base_uncased_da.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Danish sent_dajobbert_base_uncased BertSentenceEmbeddings from jjzha +author: John Snow Labs +name: sent_dajobbert_base_uncased +date: 2024-09-23 +tags: [da, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: da +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_dajobbert_base_uncased` is a Danish model originally trained by jjzha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_dajobbert_base_uncased_da_5.5.0_3.0_1727109759877.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_dajobbert_base_uncased_da_5.5.0_3.0_1727109759877.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_dajobbert_base_uncased","da") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_dajobbert_base_uncased","da") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_dajobbert_base_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|da| +|Size:|411.3 MB| + +## References + +https://huggingface.co/jjzha/dajobbert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_danish_legal_bert_base_pipeline_da.md b/docs/_posts/ahmedlone127/2024-09-23-sent_danish_legal_bert_base_pipeline_da.md new file mode 100644 index 00000000000000..0b5dd7512b7477 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_danish_legal_bert_base_pipeline_da.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Danish sent_danish_legal_bert_base_pipeline pipeline BertSentenceEmbeddings from coastalcph +author: John Snow Labs +name: sent_danish_legal_bert_base_pipeline +date: 2024-09-23 +tags: [da, open_source, pipeline, onnx] +task: Embeddings +language: da +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_danish_legal_bert_base_pipeline` is a Danish model originally trained by coastalcph. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_danish_legal_bert_base_pipeline_da_5.5.0_3.0_1727123296342.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_danish_legal_bert_base_pipeline_da_5.5.0_3.0_1727123296342.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_danish_legal_bert_base_pipeline", lang = "da") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_danish_legal_bert_base_pipeline", lang = "da") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_danish_legal_bert_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|da| +|Size:|412.1 MB| + +## References + +https://huggingface.co/coastalcph/danish-legal-bert-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_gujarati_bert_pipeline_gu.md b/docs/_posts/ahmedlone127/2024-09-23-sent_gujarati_bert_pipeline_gu.md new file mode 100644 index 00000000000000..e0f1abd75a6c5e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_gujarati_bert_pipeline_gu.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Gujarati sent_gujarati_bert_pipeline pipeline BertSentenceEmbeddings from l3cube-pune +author: John Snow Labs +name: sent_gujarati_bert_pipeline +date: 2024-09-23 +tags: [gu, open_source, pipeline, onnx] +task: Embeddings +language: gu +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_gujarati_bert_pipeline` is a Gujarati model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_gujarati_bert_pipeline_gu_5.5.0_3.0_1727101781243.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_gujarati_bert_pipeline_gu_5.5.0_3.0_1727101781243.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_gujarati_bert_pipeline", lang = "gu") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_gujarati_bert_pipeline", lang = "gu") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_gujarati_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|gu| +|Size:|891.0 MB| + +## References + +https://huggingface.co/l3cube-pune/gujarati-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_hindi_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_hindi_bert_pipeline_en.md new file mode 100644 index 00000000000000..9363fb2b4b0b64 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_hindi_bert_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_hindi_bert_pipeline pipeline BertSentenceEmbeddings from sukritin +author: John Snow Labs +name: sent_hindi_bert_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hindi_bert_pipeline` is a English model originally trained by sukritin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hindi_bert_pipeline_en_5.5.0_3.0_1727110229750.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hindi_bert_pipeline_en_5.5.0_3.0_1727110229750.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_hindi_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_hindi_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hindi_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|609.8 MB| + +## References + +https://huggingface.co/sukritin/hindi-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_malay_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_malay_bert_pipeline_en.md new file mode 100644 index 00000000000000..dcc9b4b67d8da0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_malay_bert_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_malay_bert_pipeline pipeline BertSentenceEmbeddings from NLP4H +author: John Snow Labs +name: sent_malay_bert_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_malay_bert_pipeline` is a English model originally trained by NLP4H. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_malay_bert_pipeline_en_5.5.0_3.0_1727101712886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_malay_bert_pipeline_en_5.5.0_3.0_1727101712886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_malay_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_malay_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_malay_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.8 MB| + +## References + +https://huggingface.co/NLP4H/ms_bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_mobilebert_sanskrit_saskta_pre_training_complete_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_mobilebert_sanskrit_saskta_pre_training_complete_pipeline_en.md new file mode 100644 index 00000000000000..5e302e980ffa19 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_mobilebert_sanskrit_saskta_pre_training_complete_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_mobilebert_sanskrit_saskta_pre_training_complete_pipeline pipeline BertSentenceEmbeddings from gokuls +author: John Snow Labs +name: sent_mobilebert_sanskrit_saskta_pre_training_complete_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_mobilebert_sanskrit_saskta_pre_training_complete_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_mobilebert_sanskrit_saskta_pre_training_complete_pipeline_en_5.5.0_3.0_1727105593324.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_mobilebert_sanskrit_saskta_pre_training_complete_pipeline_en_5.5.0_3.0_1727105593324.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_mobilebert_sanskrit_saskta_pre_training_complete_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_mobilebert_sanskrit_saskta_pre_training_complete_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_mobilebert_sanskrit_saskta_pre_training_complete_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|93.1 MB| + +## References + +https://huggingface.co/gokuls/mobilebert_sa_pre-training-complete + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_small_mlm_rotten_tomatoes_custom_tokenizer_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_small_mlm_rotten_tomatoes_custom_tokenizer_pipeline_en.md new file mode 100644 index 00000000000000..6c8921cd40d16e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_small_mlm_rotten_tomatoes_custom_tokenizer_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_small_mlm_rotten_tomatoes_custom_tokenizer_pipeline pipeline BertSentenceEmbeddings from muhtasham +author: John Snow Labs +name: sent_small_mlm_rotten_tomatoes_custom_tokenizer_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_small_mlm_rotten_tomatoes_custom_tokenizer_pipeline` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_small_mlm_rotten_tomatoes_custom_tokenizer_pipeline_en_5.5.0_3.0_1727109905716.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_small_mlm_rotten_tomatoes_custom_tokenizer_pipeline_en_5.5.0_3.0_1727109905716.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_small_mlm_rotten_tomatoes_custom_tokenizer_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_small_mlm_rotten_tomatoes_custom_tokenizer_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_small_mlm_rotten_tomatoes_custom_tokenizer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|107.5 MB| + +## References + +https://huggingface.co/muhtasham/small-mlm-rotten_tomatoes-custom-tokenizer + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sent_test_bert_base_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sent_test_bert_base_uncased_pipeline_en.md new file mode 100644 index 00000000000000..49d33f07c3fb43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sent_test_bert_base_uncased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_test_bert_base_uncased_pipeline pipeline BertSentenceEmbeddings from kkkzzzkkk +author: John Snow Labs +name: sent_test_bert_base_uncased_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_test_bert_base_uncased_pipeline` is a English model originally trained by kkkzzzkkk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_test_bert_base_uncased_pipeline_en_5.5.0_3.0_1727123045148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_test_bert_base_uncased_pipeline_en_5.5.0_3.0_1727123045148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_test_bert_base_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_test_bert_base_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_test_bert_base_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/kkkzzzkkk/test_bert-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_team_28_a01794830_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_team_28_a01794830_pipeline_en.md new file mode 100644 index 00000000000000..8836a9e7f51053 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiment_analysis_model_team_28_a01794830_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_model_team_28_a01794830_pipeline pipeline DistilBertForSequenceClassification from a01794830 +author: John Snow Labs +name: sentiment_analysis_model_team_28_a01794830_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_model_team_28_a01794830_pipeline` is a English model originally trained by a01794830. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_team_28_a01794830_pipeline_en_5.5.0_3.0_1727094156564.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_team_28_a01794830_pipeline_en_5.5.0_3.0_1727094156564.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_model_team_28_a01794830_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_model_team_28_a01794830_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_model_team_28_a01794830_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/a01794830/sentiment-analysis-model-team-28 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentimentanalysis_imdb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentimentanalysis_imdb_pipeline_en.md new file mode 100644 index 00000000000000..181aea145ee75a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentimentanalysis_imdb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentimentanalysis_imdb_pipeline pipeline DistilBertForSequenceClassification from johnchangbviwit +author: John Snow Labs +name: sentimentanalysis_imdb_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentimentanalysis_imdb_pipeline` is a English model originally trained by johnchangbviwit. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentimentanalysis_imdb_pipeline_en_5.5.0_3.0_1727059897505.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentimentanalysis_imdb_pipeline_en_5.5.0_3.0_1727059897505.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentimentanalysis_imdb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentimentanalysis_imdb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentimentanalysis_imdb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/johnchangbviwit/sentimentanalysis-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiments_analysis_roberta_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiments_analysis_roberta_en.md new file mode 100644 index 00000000000000..abe88620dd813d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiments_analysis_roberta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiments_analysis_roberta RoBertaForSequenceClassification from Junr-syl +author: John Snow Labs +name: sentiments_analysis_roberta +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiments_analysis_roberta` is a English model originally trained by Junr-syl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiments_analysis_roberta_en_5.5.0_3.0_1727086030953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiments_analysis_roberta_en_5.5.0_3.0_1727086030953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiments_analysis_roberta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("sentiments_analysis_roberta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiments_analysis_roberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|438.9 MB| + +## References + +https://huggingface.co/Junr-syl/sentiments_analysis_Roberta \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sentiments_analysis_roberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sentiments_analysis_roberta_pipeline_en.md new file mode 100644 index 00000000000000..ecadd18baf6512 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sentiments_analysis_roberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiments_analysis_roberta_pipeline pipeline RoBertaForSequenceClassification from Junr-syl +author: John Snow Labs +name: sentiments_analysis_roberta_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiments_analysis_roberta_pipeline` is a English model originally trained by Junr-syl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiments_analysis_roberta_pipeline_en_5.5.0_3.0_1727086055443.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiments_analysis_roberta_pipeline_en_5.5.0_3.0_1727086055443.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiments_analysis_roberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiments_analysis_roberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiments_analysis_roberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|439.0 MB| + +## References + +https://huggingface.co/Junr-syl/sentiments_analysis_Roberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-social_media_sanskrit_saskta_finetuned_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-social_media_sanskrit_saskta_finetuned_2_pipeline_en.md new file mode 100644 index 00000000000000..f78cbc2e5eb7bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-social_media_sanskrit_saskta_finetuned_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English social_media_sanskrit_saskta_finetuned_2_pipeline pipeline DistilBertForSequenceClassification from Kwaku +author: John Snow Labs +name: social_media_sanskrit_saskta_finetuned_2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`social_media_sanskrit_saskta_finetuned_2_pipeline` is a English model originally trained by Kwaku. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/social_media_sanskrit_saskta_finetuned_2_pipeline_en_5.5.0_3.0_1727093632102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/social_media_sanskrit_saskta_finetuned_2_pipeline_en_5.5.0_3.0_1727093632102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("social_media_sanskrit_saskta_finetuned_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("social_media_sanskrit_saskta_finetuned_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|social_media_sanskrit_saskta_finetuned_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Kwaku/social_media_sa_finetuned_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-sst2_padding90model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-sst2_padding90model_pipeline_en.md new file mode 100644 index 00000000000000..a897e27a9629d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-sst2_padding90model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sst2_padding90model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: sst2_padding90model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sst2_padding90model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sst2_padding90model_pipeline_en_5.5.0_3.0_1727082128279.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sst2_padding90model_pipeline_en_5.5.0_3.0_1727082128279.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sst2_padding90model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sst2_padding90model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sst2_padding90model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/sst2_padding90model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_en.md b/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_en.md new file mode 100644 index 00000000000000..1681e4a2f6a361 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_en_5.5.0_3.0_1727110649602.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_en_5.5.0_3.0_1727110649602.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-10-2024-07-26_16-19-31 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_pipeline_en.md new file mode 100644 index 00000000000000..cb8e9b3923543b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_pipeline_en_5.5.0_3.0_1727110661338.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_pipeline_en_5.5.0_3.0_1727110661338.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_10_2024_07_26_16_19_31_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-10-2024-07-26_16-19-31 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-tamilroberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-tamilroberta_pipeline_en.md new file mode 100644 index 00000000000000..4017d593552e7c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-tamilroberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tamilroberta_pipeline pipeline RoBertaEmbeddings from apkbala107 +author: John Snow Labs +name: tamilroberta_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tamilroberta_pipeline` is a English model originally trained by apkbala107. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tamilroberta_pipeline_en_5.5.0_3.0_1727121723689.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tamilroberta_pipeline_en_5.5.0_3.0_1727121723689.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tamilroberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tamilroberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tamilroberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|310.2 MB| + +## References + +https://huggingface.co/apkbala107/tamilroberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-test1_sss2000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-test1_sss2000_pipeline_en.md new file mode 100644 index 00000000000000..67755238346eb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-test1_sss2000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test1_sss2000_pipeline pipeline DistilBertForSequenceClassification from sss2000 +author: John Snow Labs +name: test1_sss2000_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test1_sss2000_pipeline` is a English model originally trained by sss2000. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test1_sss2000_pipeline_en_5.5.0_3.0_1727059360280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test1_sss2000_pipeline_en_5.5.0_3.0_1727059360280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test1_sss2000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test1_sss2000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test1_sss2000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sss2000/test1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-test_trainerb2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-test_trainerb2_pipeline_en.md new file mode 100644 index 00000000000000..b49bbdffffb4d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-test_trainerb2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_trainerb2_pipeline pipeline DistilBertForSequenceClassification from SimoneJLaudani +author: John Snow Labs +name: test_trainerb2_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_trainerb2_pipeline` is a English model originally trained by SimoneJLaudani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_trainerb2_pipeline_en_5.5.0_3.0_1727110753420.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_trainerb2_pipeline_en_5.5.0_3.0_1727110753420.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_trainerb2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_trainerb2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_trainerb2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/SimoneJLaudani/test_trainerb2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-testing_model_jim33282007_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-testing_model_jim33282007_pipeline_en.md new file mode 100644 index 00000000000000..a54067796383eb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-testing_model_jim33282007_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English testing_model_jim33282007_pipeline pipeline DistilBertForSequenceClassification from jim33282007 +author: John Snow Labs +name: testing_model_jim33282007_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`testing_model_jim33282007_pipeline` is a English model originally trained by jim33282007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/testing_model_jim33282007_pipeline_en_5.5.0_3.0_1727082128409.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/testing_model_jim33282007_pipeline_en_5.5.0_3.0_1727082128409.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("testing_model_jim33282007_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("testing_model_jim33282007_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|testing_model_jim33282007_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jim33282007/testing_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-trialz_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-trialz_pipeline_en.md new file mode 100644 index 00000000000000..5604bdb16840b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-trialz_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English trialz_pipeline pipeline RoBertaEmbeddings from JoAmps +author: John Snow Labs +name: trialz_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trialz_pipeline` is a English model originally trained by JoAmps. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trialz_pipeline_en_5.5.0_3.0_1727056728080.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trialz_pipeline_en_5.5.0_3.0_1727056728080.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trialz_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trialz_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trialz_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.6 MB| + +## References + +https://huggingface.co/JoAmps/trialz + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-twitterfin_padding90model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-twitterfin_padding90model_pipeline_en.md new file mode 100644 index 00000000000000..1e6e42ace673b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-twitterfin_padding90model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitterfin_padding90model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: twitterfin_padding90model_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitterfin_padding90model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitterfin_padding90model_pipeline_en_5.5.0_3.0_1727074153808.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitterfin_padding90model_pipeline_en_5.5.0_3.0_1727074153808.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitterfin_padding90model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitterfin_padding90model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitterfin_padding90model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/twitterfin_padding90model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_base_v3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_v3_pipeline_en.md new file mode 100644 index 00000000000000..ef3dc8544a050e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_base_v3_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_base_v3_pipeline pipeline WhisperForCTC from raiyan007 +author: John Snow Labs +name: whisper_base_v3_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_base_v3_pipeline` is a English model originally trained by raiyan007. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_base_v3_pipeline_en_5.5.0_3.0_1727118005617.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_base_v3_pipeline_en_5.5.0_3.0_1727118005617.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_base_v3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_base_v3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_base_v3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|642.8 MB| + +## References + +https://huggingface.co/raiyan007/whisper-base-v3 + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_child50k_timestretch_steplr_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_child50k_timestretch_steplr_pipeline_ko.md new file mode 100644 index 00000000000000..be11e88b09da41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_child50k_timestretch_steplr_pipeline_ko.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Korean whisper_small_child50k_timestretch_steplr_pipeline pipeline WhisperForCTC from haseong8012 +author: John Snow Labs +name: whisper_small_child50k_timestretch_steplr_pipeline +date: 2024-09-23 +tags: [ko, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_child50k_timestretch_steplr_pipeline` is a Korean model originally trained by haseong8012. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_child50k_timestretch_steplr_pipeline_ko_5.5.0_3.0_1727052228583.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_child50k_timestretch_steplr_pipeline_ko_5.5.0_3.0_1727052228583.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_child50k_timestretch_steplr_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_child50k_timestretch_steplr_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_child50k_timestretch_steplr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|1.7 GB| + +## References + +https://huggingface.co/haseong8012/whisper-small_child50K_timestretch_stepLR + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_dutch_vl_pipeline_nl.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_dutch_vl_pipeline_nl.md new file mode 100644 index 00000000000000..9d1672966168e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_dutch_vl_pipeline_nl.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Dutch, Flemish whisper_small_dutch_vl_pipeline pipeline WhisperForCTC from fibleep +author: John Snow Labs +name: whisper_small_dutch_vl_pipeline +date: 2024-09-23 +tags: [nl, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_dutch_vl_pipeline` is a Dutch, Flemish model originally trained by fibleep. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_dutch_vl_pipeline_nl_5.5.0_3.0_1727116471808.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_dutch_vl_pipeline_nl_5.5.0_3.0_1727116471808.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_dutch_vl_pipeline", lang = "nl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_dutch_vl_pipeline", lang = "nl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_dutch_vl_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|nl| +|Size:|1.7 GB| + +## References + +https://huggingface.co/fibleep/whisper-small-nl-vl + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_kdn_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_kdn_pipeline_en.md new file mode 100644 index 00000000000000..4726c6b3b1455d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_kdn_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_kdn_pipeline pipeline WhisperForCTC from susmitabhatt +author: John Snow Labs +name: whisper_small_kdn_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_kdn_pipeline` is a English model originally trained by susmitabhatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_kdn_pipeline_en_5.5.0_3.0_1727052472835.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_kdn_pipeline_en_5.5.0_3.0_1727052472835.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_kdn_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_kdn_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_kdn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/susmitabhatt/whisper-small-kdn + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tonga_tonga_islands_chuvash_albanian_pipeline_sq.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tonga_tonga_islands_chuvash_albanian_pipeline_sq.md new file mode 100644 index 00000000000000..c44397e357b172 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_small_tonga_tonga_islands_chuvash_albanian_pipeline_sq.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Albanian whisper_small_tonga_tonga_islands_chuvash_albanian_pipeline pipeline WhisperForCTC from rishabhjain16 +author: John Snow Labs +name: whisper_small_tonga_tonga_islands_chuvash_albanian_pipeline +date: 2024-09-23 +tags: [sq, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: sq +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_tonga_tonga_islands_chuvash_albanian_pipeline` is a Albanian model originally trained by rishabhjain16. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_tonga_tonga_islands_chuvash_albanian_pipeline_sq_5.5.0_3.0_1727117970737.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_tonga_tonga_islands_chuvash_albanian_pipeline_sq_5.5.0_3.0_1727117970737.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_tonga_tonga_islands_chuvash_albanian_pipeline", lang = "sq") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_tonga_tonga_islands_chuvash_albanian_pipeline", lang = "sq") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_tonga_tonga_islands_chuvash_albanian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|sq| +|Size:|1.7 GB| + +## References + +https://huggingface.co/rishabhjain16/whisper-small_to_cv_albanian + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_english_us_b_koopman_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_english_us_b_koopman_pipeline_en.md new file mode 100644 index 00000000000000..0923de373427e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_minds14_english_us_b_koopman_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_tiny_minds14_english_us_b_koopman_pipeline pipeline WhisperForCTC from b-koopman +author: John Snow Labs +name: whisper_tiny_minds14_english_us_b_koopman_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_minds14_english_us_b_koopman_pipeline` is a English model originally trained by b-koopman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_us_b_koopman_pipeline_en_5.5.0_3.0_1727051433077.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_minds14_english_us_b_koopman_pipeline_en_5.5.0_3.0_1727051433077.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_tiny_minds14_english_us_b_koopman_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_tiny_minds14_english_us_b_koopman_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_minds14_english_us_b_koopman_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.9 MB| + +## References + +https://huggingface.co/b-koopman/whisper-tiny-minds14-en-US + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_en.md b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_en.md new file mode 100644 index 00000000000000..ee46a06127f2ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11 WhisperForCTC from sgonzalezsilot +author: John Snow Labs +name: whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11 +date: 2024-09-23 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11` is a English model originally trained by sgonzalezsilot. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_en_5.5.0_3.0_1727117371147.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11_en_5.5.0_3.0_1727117371147.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_tiny_spanish_spanish_nemo_unified_2024_06_26_09_12_11| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|390.6 MB| + +## References + +https://huggingface.co/sgonzalezsilot/whisper-tiny-spanish-es-Nemo_unified_2024-06-26_09-12-11 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-withinapps_ndd_mantisbt_test_tags_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-withinapps_ndd_mantisbt_test_tags_pipeline_en.md new file mode 100644 index 00000000000000..a617f9422d18b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-withinapps_ndd_mantisbt_test_tags_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English withinapps_ndd_mantisbt_test_tags_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: withinapps_ndd_mantisbt_test_tags_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`withinapps_ndd_mantisbt_test_tags_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mantisbt_test_tags_pipeline_en_5.5.0_3.0_1727108302988.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/withinapps_ndd_mantisbt_test_tags_pipeline_en_5.5.0_3.0_1727108302988.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("withinapps_ndd_mantisbt_test_tags_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("withinapps_ndd_mantisbt_test_tags_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|withinapps_ndd_mantisbt_test_tags_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/WITHINAPPS_NDD-mantisbt_test-tags + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_english_sentweet_targeted_insult_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_english_sentweet_targeted_insult_en.md new file mode 100644 index 00000000000000..50b80e4fb07be0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_english_sentweet_targeted_insult_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_english_sentweet_targeted_insult XlmRoBertaForSequenceClassification from jayanta +author: John Snow Labs +name: xlm_roberta_base_english_sentweet_targeted_insult +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_english_sentweet_targeted_insult` is a English model originally trained by jayanta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_english_sentweet_targeted_insult_en_5.5.0_3.0_1727089332791.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_english_sentweet_targeted_insult_en_5.5.0_3.0_1727089332791.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_english_sentweet_targeted_insult","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_english_sentweet_targeted_insult", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_english_sentweet_targeted_insult| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|785.8 MB| + +## References + +https://huggingface.co/jayanta/xlm-roberta-base-english-sentweet-targeted-insult \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_english_sentweet_targeted_insult_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_english_sentweet_targeted_insult_pipeline_en.md new file mode 100644 index 00000000000000..53615b5f53a68e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_english_sentweet_targeted_insult_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_english_sentweet_targeted_insult_pipeline pipeline XlmRoBertaForSequenceClassification from jayanta +author: John Snow Labs +name: xlm_roberta_base_english_sentweet_targeted_insult_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_english_sentweet_targeted_insult_pipeline` is a English model originally trained by jayanta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_english_sentweet_targeted_insult_pipeline_en_5.5.0_3.0_1727089470422.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_english_sentweet_targeted_insult_pipeline_en_5.5.0_3.0_1727089470422.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_english_sentweet_targeted_insult_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_english_sentweet_targeted_insult_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_english_sentweet_targeted_insult_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|785.8 MB| + +## References + +https://huggingface.co/jayanta/xlm-roberta-base-english-sentweet-targeted-insult + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_final_mixed_train_1_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_final_mixed_train_1_en.md new file mode 100644 index 00000000000000..ddba5e8ae4b112 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_final_mixed_train_1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_final_mixed_train_1 XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_final_mixed_train_1 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_final_mixed_train_1` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_train_1_en_5.5.0_3.0_1727126747007.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_mixed_train_1_en_5.5.0_3.0_1727126747007.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_final_mixed_train_1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_final_mixed_train_1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_final_mixed_train_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|795.0 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Final_Mixed-train-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_all_ligerre_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_all_ligerre_en.md new file mode 100644 index 00000000000000..cae8867c22f489 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_all_ligerre_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_all_ligerre XlmRoBertaForTokenClassification from ligerre +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_all_ligerre +date: 2024-09-23 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_all_ligerre` is a English model originally trained by ligerre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ligerre_en_5.5.0_3.0_1727062079218.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_all_ligerre_en_5.5.0_3.0_1727062079218.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_ligerre","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_all_ligerre", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_all_ligerre| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|861.0 MB| + +## References + +https://huggingface.co/ligerre/xlm-roberta-base-finetuned-panx-all \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_esperesa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_esperesa_pipeline_en.md new file mode 100644 index 00000000000000..892eb990fc9dcc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_french_esperesa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_esperesa_pipeline pipeline XlmRoBertaForTokenClassification from esperesa +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_esperesa_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_esperesa_pipeline` is a English model originally trained by esperesa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_esperesa_pipeline_en_5.5.0_3.0_1727132152169.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_esperesa_pipeline_en_5.5.0_3.0_1727132152169.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_esperesa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_esperesa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_esperesa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/esperesa/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_misterneil_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_misterneil_pipeline_en.md new file mode 100644 index 00000000000000..42300f7a29d902 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_french_misterneil_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_misterneil_pipeline pipeline XlmRoBertaForTokenClassification from misterneil +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_misterneil_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_misterneil_pipeline` is a English model originally trained by misterneil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_misterneil_pipeline_en_5.5.0_3.0_1727133242943.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_misterneil_pipeline_en_5.5.0_3.0_1727133242943.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_misterneil_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_french_misterneil_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_misterneil_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/misterneil/xlm-roberta-base-finetuned-panx-de-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_thucdangvan020999_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_thucdangvan020999_pipeline_en.md new file mode 100644 index 00000000000000..842b11eeed404f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_finetuned_panx_german_thucdangvan020999_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_thucdangvan020999_pipeline pipeline XlmRoBertaForTokenClassification from thucdangvan020999 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_thucdangvan020999_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_thucdangvan020999_pipeline` is a English model originally trained by thucdangvan020999. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_thucdangvan020999_pipeline_en_5.5.0_3.0_1727132751090.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_thucdangvan020999_pipeline_en_5.5.0_3.0_1727132751090.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_thucdangvan020999_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_thucdangvan020999_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_thucdangvan020999_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/thucdangvan020999/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_pipeline_en.md new file mode 100644 index 00000000000000..c27e8e7562f8b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_pipeline +date: 2024-09-23 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_pipeline_en_5.5.0_3.0_1727099948457.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_pipeline_en_5.5.0_3.0_1727099948457.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_italian_tweet_sentiment_italian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|457.4 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-it-tweet-sentiment-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_en.md b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_en.md new file mode 100644 index 00000000000000..9382428145254a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-23-xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_xnli_spanish_trimmed_spanish_10000 XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_xnli_spanish_trimmed_spanish_10000 +date: 2024-09-23 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_xnli_spanish_trimmed_spanish_10000` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_en_5.5.0_3.0_1727089318019.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_xnli_spanish_trimmed_spanish_10000_en_5.5.0_3.0_1727089318019.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_spanish_trimmed_spanish_10000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_xnli_spanish_trimmed_spanish_10000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_xnli_spanish_trimmed_spanish_10000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|353.6 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-xnli-es-trimmed-es-10000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-0_0000005_0_999_rose_e_wang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-0_0000005_0_999_rose_e_wang_pipeline_en.md new file mode 100644 index 00000000000000..4824866730192c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-0_0000005_0_999_rose_e_wang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 0_0000005_0_999_rose_e_wang_pipeline pipeline RoBertaForSequenceClassification from rose-e-wang +author: John Snow Labs +name: 0_0000005_0_999_rose_e_wang_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`0_0000005_0_999_rose_e_wang_pipeline` is a English model originally trained by rose-e-wang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/0_0000005_0_999_rose_e_wang_pipeline_en_5.5.0_3.0_1727171846773.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/0_0000005_0_999_rose_e_wang_pipeline_en_5.5.0_3.0_1727171846773.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("0_0000005_0_999_rose_e_wang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("0_0000005_0_999_rose_e_wang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|0_0000005_0_999_rose_e_wang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/rose-e-wang/0.0000005_0.999 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-2020_q4_50p_filtered_random_prog_from_q3_en.md b/docs/_posts/ahmedlone127/2024-09-24-2020_q4_50p_filtered_random_prog_from_q3_en.md new file mode 100644 index 00000000000000..5400fc166749a3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-2020_q4_50p_filtered_random_prog_from_q3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2020_q4_50p_filtered_random_prog_from_q3 RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q4_50p_filtered_random_prog_from_q3 +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q4_50p_filtered_random_prog_from_q3` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q4_50p_filtered_random_prog_from_q3_en_5.5.0_3.0_1727168989425.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q4_50p_filtered_random_prog_from_q3_en_5.5.0_3.0_1727168989425.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("2020_q4_50p_filtered_random_prog_from_q3","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("2020_q4_50p_filtered_random_prog_from_q3","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q4_50p_filtered_random_prog_from_q3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q4-50p-filtered-random-prog_from_Q3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-2020_q4_50p_filtered_random_prog_from_q3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-2020_q4_50p_filtered_random_prog_from_q3_pipeline_en.md new file mode 100644 index 00000000000000..5b20b93f1bbd08 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-2020_q4_50p_filtered_random_prog_from_q3_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English 2020_q4_50p_filtered_random_prog_from_q3_pipeline pipeline RoBertaEmbeddings from DouglasPontes +author: John Snow Labs +name: 2020_q4_50p_filtered_random_prog_from_q3_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2020_q4_50p_filtered_random_prog_from_q3_pipeline` is a English model originally trained by DouglasPontes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2020_q4_50p_filtered_random_prog_from_q3_pipeline_en_5.5.0_3.0_1727169013666.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2020_q4_50p_filtered_random_prog_from_q3_pipeline_en_5.5.0_3.0_1727169013666.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("2020_q4_50p_filtered_random_prog_from_q3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("2020_q4_50p_filtered_random_prog_from_q3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2020_q4_50p_filtered_random_prog_from_q3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DouglasPontes/2020-Q4-50p-filtered-random-prog_from_Q3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-afro_xlmr_base_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-24-afro_xlmr_base_pipeline_xx.md new file mode 100644 index 00000000000000..32f732b7fb4b78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-afro_xlmr_base_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual afro_xlmr_base_pipeline pipeline XlmRoBertaEmbeddings from Davlan +author: John Snow Labs +name: afro_xlmr_base_pipeline +date: 2024-09-24 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afro_xlmr_base_pipeline` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_pipeline_xx_5.5.0_3.0_1727209726819.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_pipeline_xx_5.5.0_3.0_1727209726819.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("afro_xlmr_base_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("afro_xlmr_base_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afro_xlmr_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Davlan/afro-xlmr-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-afro_xlmr_base_xx.md b/docs/_posts/ahmedlone127/2024-09-24-afro_xlmr_base_xx.md new file mode 100644 index 00000000000000..d9d92e0e59322c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-afro_xlmr_base_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual afro_xlmr_base XlmRoBertaEmbeddings from Davlan +author: John Snow Labs +name: afro_xlmr_base +date: 2024-09-24 +tags: [xx, open_source, onnx, embeddings, xlm_roberta] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`afro_xlmr_base` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_xx_5.5.0_3.0_1727209666721.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/afro_xlmr_base_xx_5.5.0_3.0_1727209666721.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = XlmRoBertaEmbeddings.pretrained("afro_xlmr_base","xx") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = XlmRoBertaEmbeddings.pretrained("afro_xlmr_base","xx") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|afro_xlmr_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[xlm_roberta]| +|Language:|xx| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Davlan/afro-xlmr-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-albert_base_japanese_v1_ja.md b/docs/_posts/ahmedlone127/2024-09-24-albert_base_japanese_v1_ja.md new file mode 100644 index 00000000000000..b04afdabf53d66 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-albert_base_japanese_v1_ja.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Japanese albert_base_japanese_v1 AlbertEmbeddings from ken11 +author: John Snow Labs +name: albert_base_japanese_v1 +date: 2024-09-24 +tags: [ja, open_source, onnx, embeddings, albert] +task: Embeddings +language: ja +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_base_japanese_v1` is a Japanese model originally trained by ken11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_base_japanese_v1_ja_5.5.0_3.0_1727220084075.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_base_japanese_v1_ja_5.5.0_3.0_1727220084075.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = AlbertEmbeddings.pretrained("albert_base_japanese_v1","ja") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = AlbertEmbeddings.pretrained("albert_base_japanese_v1","ja") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_base_japanese_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence, token]| +|Output Labels:|[albert]| +|Language:|ja| +|Size:|42.8 MB| + +## References + +https://huggingface.co/ken11/albert-base-japanese-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-albert_base_japanese_v1_pipeline_ja.md b/docs/_posts/ahmedlone127/2024-09-24-albert_base_japanese_v1_pipeline_ja.md new file mode 100644 index 00000000000000..a7936abbc8b2d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-albert_base_japanese_v1_pipeline_ja.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Japanese albert_base_japanese_v1_pipeline pipeline AlbertEmbeddings from ken11 +author: John Snow Labs +name: albert_base_japanese_v1_pipeline +date: 2024-09-24 +tags: [ja, open_source, pipeline, onnx] +task: Embeddings +language: ja +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_base_japanese_v1_pipeline` is a Japanese model originally trained by ken11. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_base_japanese_v1_pipeline_ja_5.5.0_3.0_1727220086439.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_base_japanese_v1_pipeline_ja_5.5.0_3.0_1727220086439.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albert_base_japanese_v1_pipeline", lang = "ja") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albert_base_japanese_v1_pipeline", lang = "ja") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_base_japanese_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ja| +|Size:|42.8 MB| + +## References + +https://huggingface.co/ken11/albert-base-japanese-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-albert_japanese_en.md b/docs/_posts/ahmedlone127/2024-09-24-albert_japanese_en.md new file mode 100644 index 00000000000000..b422d66c7930b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-albert_japanese_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English albert_japanese AlbertEmbeddings from ALINEAR +author: John Snow Labs +name: albert_japanese +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, albert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: AlbertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_japanese` is a English model originally trained by ALINEAR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_japanese_en_5.5.0_3.0_1727220080203.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_japanese_en_5.5.0_3.0_1727220080203.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = AlbertEmbeddings.pretrained("albert_japanese","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = AlbertEmbeddings.pretrained("albert_japanese","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_japanese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence, token]| +|Output Labels:|[albert]| +|Language:|en| +|Size:|42.9 MB| + +## References + +https://huggingface.co/ALINEAR/albert-japanese \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-albert_japanese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-albert_japanese_pipeline_en.md new file mode 100644 index 00000000000000..3c45a989de2aab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-albert_japanese_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English albert_japanese_pipeline pipeline AlbertEmbeddings from ALINEAR +author: John Snow Labs +name: albert_japanese_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_japanese_pipeline` is a English model originally trained by ALINEAR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_japanese_pipeline_en_5.5.0_3.0_1727220082589.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_japanese_pipeline_en_5.5.0_3.0_1727220082589.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albert_japanese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albert_japanese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_japanese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|42.9 MB| + +## References + +https://huggingface.co/ALINEAR/albert-japanese + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-albert_news_classification_tw.md b/docs/_posts/ahmedlone127/2024-09-24-albert_news_classification_tw.md new file mode 100644 index 00000000000000..208cbbee2ead41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-albert_news_classification_tw.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Twi albert_news_classification BertForSequenceClassification from clhuang +author: John Snow Labs +name: albert_news_classification +date: 2024-09-24 +tags: [tw, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: tw +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_news_classification` is a Twi model originally trained by clhuang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_news_classification_tw_5.5.0_3.0_1727213606690.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_news_classification_tw_5.5.0_3.0_1727213606690.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("albert_news_classification","tw") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("albert_news_classification", "tw") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_news_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|tw| +|Size:|39.8 MB| + +## References + +https://huggingface.co/clhuang/albert-news-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-albert_punctuation_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-albert_punctuation_pipeline_en.md new file mode 100644 index 00000000000000..5b8573b26c98b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-albert_punctuation_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English albert_punctuation_pipeline pipeline BertForTokenClassification from Wikidepia +author: John Snow Labs +name: albert_punctuation_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_punctuation_pipeline` is a English model originally trained by Wikidepia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_punctuation_pipeline_en_5.5.0_3.0_1727203077794.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_punctuation_pipeline_en_5.5.0_3.0_1727203077794.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albert_punctuation_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albert_punctuation_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_punctuation_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|42.0 MB| + +## References + +https://huggingface.co/Wikidepia/albert-punctuation + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-albert_small_kor_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-albert_small_kor_v1_pipeline_en.md new file mode 100644 index 00000000000000..4f14f1098269d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-albert_small_kor_v1_pipeline_en.md @@ -0,0 +1,72 @@ +--- +layout: model +title: English albert_small_kor_v1_pipeline pipeline AlbertEmbeddings from bongsoo +author: John Snow Labs +name: albert_small_kor_v1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albert_small_kor_v1_pipeline` is a English model originally trained by bongsoo. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albert_small_kor_v1_pipeline_en_5.5.0_3.0_1727158727845.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albert_small_kor_v1_pipeline_en_5.5.0_3.0_1727158727845.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +pipeline = PretrainedPipeline("albert_small_kor_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) +``` +```scala +val pipeline = new PretrainedPipeline("albert_small_kor_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albert_small_kor_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|41.7 MB| + +## References + +References + +https://huggingface.co/bongsoo/albert-small-kor-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-answer_equivalence_bert_en.md b/docs/_posts/ahmedlone127/2024-09-24-answer_equivalence_bert_en.md new file mode 100644 index 00000000000000..1e4489af0453f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-answer_equivalence_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English answer_equivalence_bert BertForSequenceClassification from zli12321 +author: John Snow Labs +name: answer_equivalence_bert +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`answer_equivalence_bert` is a English model originally trained by zli12321. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/answer_equivalence_bert_en_5.5.0_3.0_1727219409393.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/answer_equivalence_bert_en_5.5.0_3.0_1727219409393.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("answer_equivalence_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("answer_equivalence_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|answer_equivalence_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/zli12321/answer_equivalence_bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_ads_classification_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_ads_classification_en.md new file mode 100644 index 00000000000000..5bd40d8c6066d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_ads_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_ads_classification BertForSequenceClassification from bondarchukb +author: John Snow Labs +name: bert_ads_classification +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ads_classification` is a English model originally trained by bondarchukb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ads_classification_en_5.5.0_3.0_1727213686943.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ads_classification_en_5.5.0_3.0_1727213686943.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_ads_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_ads_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ads_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/bondarchukb/bert-ads-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_ads_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_ads_classification_pipeline_en.md new file mode 100644 index 00000000000000..f27dcd29b13863 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_ads_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_ads_classification_pipeline pipeline BertForSequenceClassification from bondarchukb +author: John Snow Labs +name: bert_ads_classification_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_ads_classification_pipeline` is a English model originally trained by bondarchukb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_ads_classification_pipeline_en_5.5.0_3.0_1727213707589.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_ads_classification_pipeline_en_5.5.0_3.0_1727213707589.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_ads_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_ads_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_ads_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/bondarchukb/bert-ads-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_en.md new file mode 100644 index 00000000000000..9bbb1a20c70807 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_arabic_camelbert_catalan_99189_pretrain_resampled BertForQuestionAnswering from MatMulMan +author: John Snow Labs +name: bert_base_arabic_camelbert_catalan_99189_pretrain_resampled +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_arabic_camelbert_catalan_99189_pretrain_resampled` is a English model originally trained by MatMulMan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_en_5.5.0_3.0_1727206805653.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_en_5.5.0_3.0_1727206805653.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_arabic_camelbert_catalan_99189_pretrain_resampled","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_arabic_camelbert_catalan_99189_pretrain_resampled", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_arabic_camelbert_catalan_99189_pretrain_resampled| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|406.6 MB| + +## References + +https://huggingface.co/MatMulMan/bert-base-arabic-camelbert-ca-99189-pretrain_resampled \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_pipeline_en.md new file mode 100644 index 00000000000000..b1223ff71a0ad7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_pipeline pipeline BertForQuestionAnswering from MatMulMan +author: John Snow Labs +name: bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_pipeline` is a English model originally trained by MatMulMan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_pipeline_en_5.5.0_3.0_1727206827761.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_pipeline_en_5.5.0_3.0_1727206827761.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_arabic_camelbert_catalan_99189_pretrain_resampled_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.6 MB| + +## References + +https://huggingface.co/MatMulMan/bert-base-arabic-camelbert-ca-99189-pretrain_resampled + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_stsb_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_stsb_en.md new file mode 100644 index 00000000000000..22b4775abcee67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_stsb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_cased_finetuned_stsb BertForSequenceClassification from gchhablani +author: John Snow Labs +name: bert_base_cased_finetuned_stsb +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_stsb` is a English model originally trained by gchhablani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_stsb_en_5.5.0_3.0_1727218707497.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_stsb_en_5.5.0_3.0_1727218707497.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_finetuned_stsb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_finetuned_stsb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_stsb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/gchhablani/bert-base-cased-finetuned-stsb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_stsb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_stsb_pipeline_en.md new file mode 100644 index 00000000000000..e559e53ad59c15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_finetuned_stsb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_cased_finetuned_stsb_pipeline pipeline BertForSequenceClassification from gchhablani +author: John Snow Labs +name: bert_base_cased_finetuned_stsb_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_stsb_pipeline` is a English model originally trained by gchhablani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_stsb_pipeline_en_5.5.0_3.0_1727218728753.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_stsb_pipeline_en_5.5.0_3.0_1727218728753.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_finetuned_stsb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_finetuned_stsb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_stsb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/gchhablani/bert-base-cased-finetuned-stsb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_jennyc_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_jennyc_en.md new file mode 100644 index 00000000000000..e0f3dfd4812526 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_jennyc_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_cased_jennyc BertForQuestionAnswering from jennyc +author: John Snow Labs +name: bert_base_cased_jennyc +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_jennyc` is a English model originally trained by jennyc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_jennyc_en_5.5.0_3.0_1727175887390.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_jennyc_en_5.5.0_3.0_1727175887390.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_jennyc","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_cased_jennyc", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_jennyc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/jennyc/bert-base-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_scmedium_scqa2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_scmedium_scqa2_pipeline_en.md new file mode 100644 index 00000000000000..3dedc015585163 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_cased_scmedium_scqa2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_cased_scmedium_scqa2_pipeline pipeline BertForQuestionAnswering from CambridgeMolecularEngineering +author: John Snow Labs +name: bert_base_cased_scmedium_scqa2_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_scmedium_scqa2_pipeline` is a English model originally trained by CambridgeMolecularEngineering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_scmedium_scqa2_pipeline_en_5.5.0_3.0_1727175369752.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_scmedium_scqa2_pipeline_en_5.5.0_3.0_1727175369752.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_scmedium_scqa2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_scmedium_scqa2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_scmedium_scqa2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/CambridgeMolecularEngineering/bert-base-cased-scmedium-scqa2 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_chinese_finetuned_question_answering_4_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_chinese_finetuned_question_answering_4_en.md new file mode 100644 index 00000000000000..96503d1d0fe3e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_chinese_finetuned_question_answering_4_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_chinese_finetuned_question_answering_4 BertForQuestionAnswering from jazzson +author: John Snow Labs +name: bert_base_chinese_finetuned_question_answering_4 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_finetuned_question_answering_4` is a English model originally trained by jazzson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_question_answering_4_en_5.5.0_3.0_1727217040537.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_question_answering_4_en_5.5.0_3.0_1727217040537.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_chinese_finetuned_question_answering_4","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_chinese_finetuned_question_answering_4", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_finetuned_question_answering_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|381.1 MB| + +## References + +https://huggingface.co/jazzson/bert-base-chinese-finetuned-question-answering-4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_chinese_finetuned_question_answering_4_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_chinese_finetuned_question_answering_4_pipeline_en.md new file mode 100644 index 00000000000000..914b7a745201a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_chinese_finetuned_question_answering_4_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_chinese_finetuned_question_answering_4_pipeline pipeline BertForQuestionAnswering from jazzson +author: John Snow Labs +name: bert_base_chinese_finetuned_question_answering_4_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_finetuned_question_answering_4_pipeline` is a English model originally trained by jazzson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_question_answering_4_pipeline_en_5.5.0_3.0_1727217060142.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_question_answering_4_pipeline_en_5.5.0_3.0_1727217060142.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_chinese_finetuned_question_answering_4_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_chinese_finetuned_question_answering_4_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_finetuned_question_answering_4_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|381.1 MB| + +## References + +https://huggingface.co/jazzson/bert-base-chinese-finetuned-question-answering-4 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_chinese_finetuned_question_answering_6_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_chinese_finetuned_question_answering_6_en.md new file mode 100644 index 00000000000000..8fdbc4ebdbd95c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_chinese_finetuned_question_answering_6_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_chinese_finetuned_question_answering_6 BertForQuestionAnswering from jazzson +author: John Snow Labs +name: bert_base_chinese_finetuned_question_answering_6 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_finetuned_question_answering_6` is a English model originally trained by jazzson. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_question_answering_6_en_5.5.0_3.0_1727216906697.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuned_question_answering_6_en_5.5.0_3.0_1727216906697.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_chinese_finetuned_question_answering_6","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_chinese_finetuned_question_answering_6", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_finetuned_question_answering_6| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|381.0 MB| + +## References + +https://huggingface.co/jazzson/bert-base-chinese-finetuned-question-answering-6 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_english_greek_modern_cased_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_english_greek_modern_cased_en.md new file mode 100644 index 00000000000000..aa807f51a017e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_english_greek_modern_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_english_greek_modern_cased BertEmbeddings from Geotrend +author: John Snow Labs +name: bert_base_english_greek_modern_cased +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_english_greek_modern_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_english_greek_modern_cased_en_5.5.0_3.0_1727162039629.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_english_greek_modern_cased_en_5.5.0_3.0_1727162039629.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_english_greek_modern_cased","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_english_greek_modern_cased","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_english_greek_modern_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|408.4 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-el-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_english_greek_modern_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_english_greek_modern_cased_pipeline_en.md new file mode 100644 index 00000000000000..531b76490a8e47 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_english_greek_modern_cased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_english_greek_modern_cased_pipeline pipeline BertEmbeddings from Geotrend +author: John Snow Labs +name: bert_base_english_greek_modern_cased_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_english_greek_modern_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_english_greek_modern_cased_pipeline_en_5.5.0_3.0_1727162061121.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_english_greek_modern_cased_pipeline_en_5.5.0_3.0_1727162061121.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_english_greek_modern_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_english_greek_modern_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_english_greek_modern_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.4 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-el-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_based_encoder_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_based_encoder_pipeline_xx.md new file mode 100644 index 00000000000000..66843946d3f942 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_based_encoder_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_based_encoder_pipeline pipeline BertEmbeddings from shsha0110 +author: John Snow Labs +name: bert_base_multilingual_cased_based_encoder_pipeline +date: 2024-09-24 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_based_encoder_pipeline` is a Multilingual model originally trained by shsha0110. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_based_encoder_pipeline_xx_5.5.0_3.0_1727200645774.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_based_encoder_pipeline_xx_5.5.0_3.0_1727200645774.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_cased_based_encoder_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_cased_based_encoder_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_based_encoder_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|664.9 MB| + +## References + +https://huggingface.co/shsha0110/bert-base-multilingual-cased-based-encoder + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_based_encoder_xx.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_based_encoder_xx.md new file mode 100644 index 00000000000000..70723b56381235 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_multilingual_cased_based_encoder_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_based_encoder BertEmbeddings from shsha0110 +author: John Snow Labs +name: bert_base_multilingual_cased_based_encoder +date: 2024-09-24 +tags: [xx, open_source, onnx, embeddings, bert] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_based_encoder` is a Multilingual model originally trained by shsha0110. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_based_encoder_xx_5.5.0_3.0_1727200610130.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_based_encoder_xx_5.5.0_3.0_1727200610130.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_multilingual_cased_based_encoder","xx") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_multilingual_cased_based_encoder","xx") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_based_encoder| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|xx| +|Size:|664.9 MB| + +## References + +https://huggingface.co/shsha0110/bert-base-multilingual-cased-based-encoder \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_paws_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_paws_en.md new file mode 100644 index 00000000000000..16a482083b41d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_paws_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_paws BertForSequenceClassification from harouzie +author: John Snow Labs +name: bert_base_paws +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_paws` is a English model originally trained by harouzie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_paws_en_5.5.0_3.0_1727218976635.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_paws_en_5.5.0_3.0_1727218976635.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_paws","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_paws", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_paws| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/harouzie/bert-base-paws \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_en.md new file mode 100644 index 00000000000000..e419c295a5b567 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146 BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_en_5.5.0_3.0_1727216919379.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_en_5.5.0_3.0_1727216919379.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240914224146 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_pipeline_en.md new file mode 100644 index 00000000000000..b5b6b44ea335e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_pipeline pipeline BertForQuestionAnswering from alcalazans +author: John Snow Labs +name: bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_pipeline` is a English model originally trained by alcalazans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_pipeline_en_5.5.0_3.0_1727216940946.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_pipeline_en_5.5.0_3.0_1727216940946.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_squad_v1_1_portuguese_ibama_v0_420240914224146_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/alcalazans/bert-base-squad-v1.1-pt-IBAMA_v0.420240914224146 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_emotion_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_emotion_en.md new file mode 100644 index 00000000000000..590d0e7c8f9723 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_emotion_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_emotion DistilBertForSequenceClassification from isom5240sp24 +author: John Snow Labs +name: bert_base_uncased_emotion +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_emotion` is a English model originally trained by isom5240sp24. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_emotion_en_5.5.0_3.0_1727205014862.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_emotion_en_5.5.0_3.0_1727205014862.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_base_uncased_emotion","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_base_uncased_emotion", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_emotion| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/isom5240sp24/bert-base-uncased-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md new file mode 100644 index 00000000000000..14c5da998550f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727175500437.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727175500437.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.0-b-32-lr-8e-07-dp-0.5-ss-0-st-True-fh-False-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..078bd909b5058c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727175521932.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline_en_5.5.0_3.0_1727175521932.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_0_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.0-b-32-lr-8e-07-dp-0.5-ss-0-st-True-fh-False-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_en.md new file mode 100644 index 00000000000000..5cc6dd6f85dc90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727176070330.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727176070330.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.45-b-32-lr-4e-06-dp-0.1-ss-300-st-False-fh-True-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_pipeline_en.md new file mode 100644 index 00000000000000..b997466c26921e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727176091440.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_pipeline_en_5.5.0_3.0_1727176091440.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_1_45_b_32_lr_4e_06_dp_0_1_swati_300_southern_sotho_false_fh_true_hs_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-1.45-b-32-lr-4e-06-dp-0.1-ss-300-st-False-fh-True-hs-0 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_2_25_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_2_25_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md new file mode 100644 index 00000000000000..5ed2fe4095072d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_2_25_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_2_25_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_2_25_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_2_25_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_25_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727175918804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_2_25_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727175918804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_2_25_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_2_25_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_2_25_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-2.25-b-32-lr-8e-07-dp-0.5-ss-0-st-True-fh-False-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_en.md new file mode 100644 index 00000000000000..416a3de611bbdd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727175786858.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0_en_5.5.0_3.0_1727175786858.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_3_11_b_32_lr_8e_07_dp_0_5_swati_700_southern_sotho_false_fh_true_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-3.11-b-32-lr-8e-07-dp-0.5-ss-700-st-False-fh-True-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_pipeline_en.md new file mode 100644 index 00000000000000..ab2c99f6807bc2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_pipeline_en_5.5.0_3.0_1727163220743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_pipeline_en_5.5.0_3.0_1727163220743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_3_44_b_32_lr_8e_07_dp_0_5_swati_0_southern_sotho_false_fh_false_hs_800_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-3.44-b-32-lr-8e-07-dp-0.5-ss-0-st-False-fh-False-hs-800 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md new file mode 100644 index 00000000000000..35e46589b89d3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727163813410.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0_en_5.5.0_3.0_1727163813410.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ep_4_87_b_32_lr_4e_07_dp_0_5_swati_0_southern_sotho_true_fh_false_hs_0| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-ep-4.87-b-32-lr-4e-07-dp-0.5-ss-0-st-True-fh-False-hs-0 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_en.md new file mode 100644 index 00000000000000..8cec5288f868cf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6 BertForQuestionAnswering from anas-awadalla +author: John Snow Labs +name: bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6` is a English model originally trained by anas-awadalla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_en_5.5.0_3.0_1727206732278.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_en_5.5.0_3.0_1727206732278.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/anas-awadalla/bert-base-uncased-few-shot-k-64-finetuned-squad-seed-6 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_pipeline_en.md new file mode 100644 index 00000000000000..124f8c6705c672 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_pipeline pipeline BertForQuestionAnswering from anas-awadalla +author: John Snow Labs +name: bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_pipeline` is a English model originally trained by anas-awadalla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_pipeline_en_5.5.0_3.0_1727206753795.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_pipeline_en_5.5.0_3.0_1727206753795.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_few_shot_k_64_finetuned_squad_seed_6_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/anas-awadalla/bert-base-uncased-few-shot-k-64-finetuned-squad-seed-6 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_pipeline_en.md new file mode 100644 index 00000000000000..3aa865f5649e2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_pipeline_en_5.5.0_3.0_1727175665424.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_pipeline_en_5.5.0_3.0_1727175665424.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_0_9_lr_1e_06_wd_0_001_dp_0_99999_swati_160000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-0.9-lr-1e-06-wd-0.001-dp-0.99999-ss-160000 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_pipeline_en.md new file mode 100644 index 00000000000000..a4dc4a1bd3b4e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_pipeline_en_5.5.0_3.0_1727176210563.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_pipeline_en_5.5.0_3.0_1727176210563.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_1e_06_wd_0_001_dp_0_99999_swati_50000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-1e-06-wd-0.001-dp-0.99999-ss-50000 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_06_wd_0_01_dp_0_2_swati_0_southern_sotho_true_fh_true_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_06_wd_0_01_dp_0_2_swati_0_southern_sotho_true_fh_true_pipeline_en.md new file mode 100644 index 00000000000000..4c90981886ae6b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_2_0_lr_4e_06_wd_0_01_dp_0_2_swati_0_southern_sotho_true_fh_true_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_2_0_lr_4e_06_wd_0_01_dp_0_2_swati_0_southern_sotho_true_fh_true_pipeline pipeline BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_2_0_lr_4e_06_wd_0_01_dp_0_2_swati_0_southern_sotho_true_fh_true_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_2_0_lr_4e_06_wd_0_01_dp_0_2_swati_0_southern_sotho_true_fh_true_pipeline` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_4e_06_wd_0_01_dp_0_2_swati_0_southern_sotho_true_fh_true_pipeline_en_5.5.0_3.0_1727175524689.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_2_0_lr_4e_06_wd_0_01_dp_0_2_swati_0_southern_sotho_true_fh_true_pipeline_en_5.5.0_3.0_1727175524689.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_4e_06_wd_0_01_dp_0_2_swati_0_southern_sotho_true_fh_true_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetune_squad_ep_2_0_lr_4e_06_wd_0_01_dp_0_2_swati_0_southern_sotho_true_fh_true_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_2_0_lr_4e_06_wd_0_01_dp_0_2_swati_0_southern_sotho_true_fh_true_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-2.0-lr-4e-06-wd-0.01-dp-0.2-ss-0-st-True-fh-True + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_800_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_800_en.md new file mode 100644 index 00000000000000..23778f1c4acec2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_800_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_800 BertForQuestionAnswering from danielkty22 +author: John Snow Labs +name: bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_800 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_800` is a English model originally trained by danielkty22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_800_en_5.5.0_3.0_1727175546565.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_800_en_5.5.0_3.0_1727175546565.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_800","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_800", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetune_squad_ep_3_44_lr_4e_07_wd_1e_05_dp_0_3_swati_0_southern_sotho_false_fh_false_hs_800| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/danielkty22/bert-base-uncased-finetune-squad-ep-3.44-lr-4e-07-wd-1e-05-dp-0.3-ss-0-st-False-fh-False-hs-800 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_imdb_shushuile_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_imdb_shushuile_en.md new file mode 100644 index 00000000000000..0cd1e8019f704b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_imdb_shushuile_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_imdb_shushuile BertEmbeddings from shushuile +author: John Snow Labs +name: bert_base_uncased_finetuned_imdb_shushuile +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_imdb_shushuile` is a English model originally trained by shushuile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_imdb_shushuile_en_5.5.0_3.0_1727201186154.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_imdb_shushuile_en_5.5.0_3.0_1727201186154.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_finetuned_imdb_shushuile","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_finetuned_imdb_shushuile","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_imdb_shushuile| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/shushuile/bert-base-uncased-finetuned-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_imdb_shushuile_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_imdb_shushuile_pipeline_en.md new file mode 100644 index 00000000000000..02390c6e10d145 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_imdb_shushuile_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_imdb_shushuile_pipeline pipeline BertEmbeddings from shushuile +author: John Snow Labs +name: bert_base_uncased_finetuned_imdb_shushuile_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_imdb_shushuile_pipeline` is a English model originally trained by shushuile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_imdb_shushuile_pipeline_en_5.5.0_3.0_1727201208409.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_imdb_shushuile_pipeline_en_5.5.0_3.0_1727201208409.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_imdb_shushuile_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_imdb_shushuile_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_imdb_shushuile_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/shushuile/bert-base-uncased-finetuned-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_news_2009_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_news_2009_pipeline_en.md new file mode 100644 index 00000000000000..ce1d46add590f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_news_2009_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_news_2009_pipeline pipeline BertEmbeddings from sally9805 +author: John Snow Labs +name: bert_base_uncased_finetuned_news_2009_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_news_2009_pipeline` is a English model originally trained by sally9805. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_news_2009_pipeline_en_5.5.0_3.0_1727177514131.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_news_2009_pipeline_en_5.5.0_3.0_1727177514131.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_news_2009_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_news_2009_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_news_2009_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sally9805/bert-base-uncased-finetuned-news-2009 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_quac_nohistory_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_quac_nohistory_pipeline_en.md new file mode 100644 index 00000000000000..48f8242af6f720 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_quac_nohistory_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_quac_nohistory_pipeline pipeline BertForQuestionAnswering from Jellevdl +author: John Snow Labs +name: bert_base_uncased_finetuned_quac_nohistory_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_quac_nohistory_pipeline` is a English model originally trained by Jellevdl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_quac_nohistory_pipeline_en_5.5.0_3.0_1727163689623.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_quac_nohistory_pipeline_en_5.5.0_3.0_1727163689623.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_quac_nohistory_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_quac_nohistory_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_quac_nohistory_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Jellevdl/bert-base-uncased-finetuned-quac-noHistory + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_quac_withouthistory_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_quac_withouthistory_v2_pipeline_en.md new file mode 100644 index 00000000000000..76f97aa2d5064c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_finetuned_quac_withouthistory_v2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_quac_withouthistory_v2_pipeline pipeline BertForQuestionAnswering from Jellevdl +author: John Snow Labs +name: bert_base_uncased_finetuned_quac_withouthistory_v2_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_quac_withouthistory_v2_pipeline` is a English model originally trained by Jellevdl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_quac_withouthistory_v2_pipeline_en_5.5.0_3.0_1727163158239.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_quac_withouthistory_v2_pipeline_en_5.5.0_3.0_1727163158239.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_quac_withouthistory_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_quac_withouthistory_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_quac_withouthistory_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Jellevdl/bert-base-uncased-finetuned-quac-withoutHistory-v2 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_newscategoryclassification_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_newscategoryclassification_en.md new file mode 100644 index 00000000000000..ddbbba0216f3fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_newscategoryclassification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_newscategoryclassification DistilBertForSequenceClassification from akashmaggon +author: John Snow Labs +name: bert_base_uncased_newscategoryclassification +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_newscategoryclassification` is a English model originally trained by akashmaggon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_newscategoryclassification_en_5.5.0_3.0_1727204776004.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_newscategoryclassification_en_5.5.0_3.0_1727204776004.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_base_uncased_newscategoryclassification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_base_uncased_newscategoryclassification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_newscategoryclassification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/akashmaggon/bert-base-uncased-newscategoryclassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_newscategoryclassification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_newscategoryclassification_pipeline_en.md new file mode 100644 index 00000000000000..f1d3f3fe9f1961 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_newscategoryclassification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_newscategoryclassification_pipeline pipeline DistilBertForSequenceClassification from akashmaggon +author: John Snow Labs +name: bert_base_uncased_newscategoryclassification_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_newscategoryclassification_pipeline` is a English model originally trained by akashmaggon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_newscategoryclassification_pipeline_en_5.5.0_3.0_1727204796937.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_newscategoryclassification_pipeline_en_5.5.0_3.0_1727204796937.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_newscategoryclassification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_newscategoryclassification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_newscategoryclassification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/akashmaggon/bert-base-uncased-newscategoryclassification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_scqa1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_scqa1_pipeline_en.md new file mode 100644 index 00000000000000..119e3baaa3a316 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_base_uncased_scqa1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_base_uncased_scqa1_pipeline pipeline BertForQuestionAnswering from CambridgeMolecularEngineering +author: John Snow Labs +name: bert_base_uncased_scqa1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_scqa1_pipeline` is a English model originally trained by CambridgeMolecularEngineering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_scqa1_pipeline_en_5.5.0_3.0_1727163156836.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_scqa1_pipeline_en_5.5.0_3.0_1727163156836.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_scqa1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_scqa1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_scqa1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/CambridgeMolecularEngineering/bert-base-uncased-scqa1 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_based_false_positive_secrets_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_based_false_positive_secrets_en.md new file mode 100644 index 00000000000000..b718bc58377ab5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_based_false_positive_secrets_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_based_false_positive_secrets DistilBertForSequenceClassification from harshvkarn +author: John Snow Labs +name: bert_based_false_positive_secrets +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_based_false_positive_secrets` is a English model originally trained by harshvkarn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_based_false_positive_secrets_en_5.5.0_3.0_1727204776135.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_based_false_positive_secrets_en_5.5.0_3.0_1727204776135.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_based_false_positive_secrets","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("bert_based_false_positive_secrets", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_based_false_positive_secrets| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/harshvkarn/bert-based-false-positive-secrets \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_based_false_positive_secrets_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_based_false_positive_secrets_pipeline_en.md new file mode 100644 index 00000000000000..9b2f9d9f53fcd3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_based_false_positive_secrets_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_based_false_positive_secrets_pipeline pipeline DistilBertForSequenceClassification from harshvkarn +author: John Snow Labs +name: bert_based_false_positive_secrets_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_based_false_positive_secrets_pipeline` is a English model originally trained by harshvkarn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_based_false_positive_secrets_pipeline_en_5.5.0_3.0_1727204796891.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_based_false_positive_secrets_pipeline_en_5.5.0_3.0_1727204796891.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_based_false_positive_secrets_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_based_false_positive_secrets_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_based_false_positive_secrets_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/harshvkarn/bert-based-false-positive-secrets + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_finetuned_arxiv_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_finetuned_arxiv_en.md new file mode 100644 index 00000000000000..115972580d717b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_finetuned_arxiv_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_arxiv BertForSequenceClassification from AyoubChLin +author: John Snow Labs +name: bert_finetuned_arxiv +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_arxiv` is a English model originally trained by AyoubChLin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_arxiv_en_5.5.0_3.0_1727222221401.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_arxiv_en_5.5.0_3.0_1727222221401.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_finetuned_arxiv","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_finetuned_arxiv", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_arxiv| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/AyoubChLin/bert-finetuned-Arxiv \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_finetuned_squad_delayedkarma_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_finetuned_squad_delayedkarma_pipeline_en.md new file mode 100644 index 00000000000000..e9dcb873d5d637 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_finetuned_squad_delayedkarma_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bert_finetuned_squad_delayedkarma_pipeline pipeline BertForQuestionAnswering from delayedkarma +author: John Snow Labs +name: bert_finetuned_squad_delayedkarma_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_squad_delayedkarma_pipeline` is a English model originally trained by delayedkarma. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_delayedkarma_pipeline_en_5.5.0_3.0_1727175643894.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_squad_delayedkarma_pipeline_en_5.5.0_3.0_1727175643894.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_squad_delayedkarma_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_squad_delayedkarma_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_squad_delayedkarma_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/delayedkarma/bert-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_gemma2b_multivllm_nodropsus_0_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_gemma2b_multivllm_nodropsus_0_pipeline_en.md new file mode 100644 index 00000000000000..bcc1e936dbefe0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_gemma2b_multivllm_nodropsus_0_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_gemma2b_multivllm_nodropsus_0_pipeline pipeline DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: bert_gemma2b_multivllm_nodropsus_0_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_gemma2b_multivllm_nodropsus_0_pipeline` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_gemma2b_multivllm_nodropsus_0_pipeline_en_5.5.0_3.0_1727164275603.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_gemma2b_multivllm_nodropsus_0_pipeline_en_5.5.0_3.0_1727164275603.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_gemma2b_multivllm_nodropsus_0_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_gemma2b_multivllm_nodropsus_0_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_gemma2b_multivllm_nodropsus_0_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/BERT_gemma2b-multivllm-NodropSus_0 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_tagalog_base_uncased_ner_v1_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_tagalog_base_uncased_ner_v1_en.md new file mode 100644 index 00000000000000..68a8b54ab1ec24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_tagalog_base_uncased_ner_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_tagalog_base_uncased_ner_v1 BertForTokenClassification from scostiniano +author: John Snow Labs +name: bert_tagalog_base_uncased_ner_v1 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tagalog_base_uncased_ner_v1` is a English model originally trained by scostiniano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tagalog_base_uncased_ner_v1_en_5.5.0_3.0_1727203607818.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tagalog_base_uncased_ner_v1_en_5.5.0_3.0_1727203607818.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_tagalog_base_uncased_ner_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_tagalog_base_uncased_ner_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tagalog_base_uncased_ner_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.0 MB| + +## References + +https://huggingface.co/scostiniano/bert-tagalog-base-uncased-ner-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bert_tagalog_base_uncased_ner_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bert_tagalog_base_uncased_ner_v1_pipeline_en.md new file mode 100644 index 00000000000000..daea8ebaca87f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bert_tagalog_base_uncased_ner_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_tagalog_base_uncased_ner_v1_pipeline pipeline BertForTokenClassification from scostiniano +author: John Snow Labs +name: bert_tagalog_base_uncased_ner_v1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tagalog_base_uncased_ner_v1_pipeline` is a English model originally trained by scostiniano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tagalog_base_uncased_ner_v1_pipeline_en_5.5.0_3.0_1727203629522.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tagalog_base_uncased_ner_v1_pipeline_en_5.5.0_3.0_1727203629522.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_tagalog_base_uncased_ner_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_tagalog_base_uncased_ner_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tagalog_base_uncased_ner_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.0 MB| + +## References + +https://huggingface.co/scostiniano/bert-tagalog-base-uncased-ner-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bertin_base_random_es.md b/docs/_posts/ahmedlone127/2024-09-24-bertin_base_random_es.md new file mode 100644 index 00000000000000..81f9b356d44e88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bertin_base_random_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bertin_base_random RoBertaEmbeddings from bertin-project +author: John Snow Labs +name: bertin_base_random +date: 2024-09-24 +tags: [es, open_source, onnx, embeddings, roberta] +task: Embeddings +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertin_base_random` is a Castilian, Spanish model originally trained by bertin-project. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertin_base_random_es_5.5.0_3.0_1727216375790.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertin_base_random_es_5.5.0_3.0_1727216375790.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("bertin_base_random","es") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("bertin_base_random","es") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertin_base_random| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|es| +|Size:|231.6 MB| + +## References + +https://huggingface.co/bertin-project/bertin-base-random \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bertin_base_random_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-24-bertin_base_random_pipeline_es.md new file mode 100644 index 00000000000000..bbcb02b5a31251 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bertin_base_random_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bertin_base_random_pipeline pipeline RoBertaEmbeddings from bertin-project +author: John Snow Labs +name: bertin_base_random_pipeline +date: 2024-09-24 +tags: [es, open_source, pipeline, onnx] +task: Embeddings +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertin_base_random_pipeline` is a Castilian, Spanish model originally trained by bertin-project. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertin_base_random_pipeline_es_5.5.0_3.0_1727216451962.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertin_base_random_pipeline_es_5.5.0_3.0_1727216451962.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bertin_base_random_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bertin_base_random_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertin_base_random_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|231.6 MB| + +## References + +https://huggingface.co/bertin-project/bertin-base-random + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bertin_roberta_base_spanish_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-24-bertin_roberta_base_spanish_pipeline_es.md new file mode 100644 index 00000000000000..7720e86b1a7271 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bertin_roberta_base_spanish_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bertin_roberta_base_spanish_pipeline pipeline RoBertaEmbeddings from bertin-project +author: John Snow Labs +name: bertin_roberta_base_spanish_pipeline +date: 2024-09-24 +tags: [es, open_source, pipeline, onnx] +task: Embeddings +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertin_roberta_base_spanish_pipeline` is a Castilian, Spanish model originally trained by bertin-project. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertin_roberta_base_spanish_pipeline_es_5.5.0_3.0_1727168840178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertin_roberta_base_spanish_pipeline_es_5.5.0_3.0_1727168840178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bertin_roberta_base_spanish_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bertin_roberta_base_spanish_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertin_roberta_base_spanish_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|462.3 MB| + +## References + +https://huggingface.co/bertin-project/bertin-roberta-base-spanish + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bertis_en.md b/docs/_posts/ahmedlone127/2024-09-24-bertis_en.md new file mode 100644 index 00000000000000..da9e633d374e22 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bertis_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bertis BertForSequenceClassification from mireillfares +author: John Snow Labs +name: bertis +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertis` is a English model originally trained by mireillfares. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertis_en_5.5.0_3.0_1727214041559.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertis_en_5.5.0_3.0_1727214041559.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bertis","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bertis", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertis| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/mireillfares/BERTIS \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bertis_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bertis_pipeline_en.md new file mode 100644 index 00000000000000..21ba4f29df2716 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bertis_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bertis_pipeline pipeline BertForSequenceClassification from mireillfares +author: John Snow Labs +name: bertis_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertis_pipeline` is a English model originally trained by mireillfares. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertis_pipeline_en_5.5.0_3.0_1727214062959.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertis_pipeline_en_5.5.0_3.0_1727214062959.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bertis_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bertis_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/mireillfares/BERTIS + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bge_base_financial_matryoshka_en.md b/docs/_posts/ahmedlone127/2024-09-24-bge_base_financial_matryoshka_en.md new file mode 100644 index 00000000000000..06efba613eb589 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bge_base_financial_matryoshka_en.md @@ -0,0 +1,87 @@ +--- +layout: model +title: English bge_base_financial_matryoshka BGEEmbeddings from ValentinaKim +author: John Snow Labs +name: bge_base_financial_matryoshka +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bge] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BGEEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka` is a English model originally trained by ValentinaKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_en_5.5.0_3.0_1727207436216.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_en_5.5.0_3.0_1727207436216.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka","en") \ + .setInputCols(["document"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + + +val embeddings = BGEEmbeddings.pretrained("bge_base_financial_matryoshka","en") + .setInputCols(Array("document")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, embeddings)) +val data = Seq("I love spark-nlp).toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[bge]| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/ValentinaKim/bge-base-financial-matryoshka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bge_base_financial_matryoshka_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bge_base_financial_matryoshka_pipeline_en.md new file mode 100644 index 00000000000000..8a49ae1d5c3f7c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bge_base_financial_matryoshka_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English bge_base_financial_matryoshka_pipeline pipeline BGEEmbeddings from ValentinaKim +author: John Snow Labs +name: bge_base_financial_matryoshka_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BGEEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bge_base_financial_matryoshka_pipeline` is a English model originally trained by ValentinaKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_pipeline_en_5.5.0_3.0_1727207463772.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bge_base_financial_matryoshka_pipeline_en_5.5.0_3.0_1727207463772.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bge_base_financial_matryoshka_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bge_base_financial_matryoshka_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bge_base_financial_matryoshka_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|387.1 MB| + +## References + +https://huggingface.co/ValentinaKim/bge-base-financial-matryoshka + +## Included Models + +- DocumentAssembler +- BGEEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bio_mobilebert_en.md b/docs/_posts/ahmedlone127/2024-09-24-bio_mobilebert_en.md new file mode 100644 index 00000000000000..5af22c5ecc073a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bio_mobilebert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bio_mobilebert BertEmbeddings from nlpie +author: John Snow Labs +name: bio_mobilebert +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bio_mobilebert` is a English model originally trained by nlpie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bio_mobilebert_en_5.5.0_3.0_1727173387006.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bio_mobilebert_en_5.5.0_3.0_1727173387006.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bio_mobilebert","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bio_mobilebert","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bio_mobilebert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|92.5 MB| + +## References + +https://huggingface.co/nlpie/bio-mobilebert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-biom_albert_xxlarge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-biom_albert_xxlarge_pipeline_en.md new file mode 100644 index 00000000000000..e7531cc8d1fe53 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-biom_albert_xxlarge_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English biom_albert_xxlarge_pipeline pipeline AlbertEmbeddings from sultan +author: John Snow Labs +name: biom_albert_xxlarge_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained AlbertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biom_albert_xxlarge_pipeline` is a English model originally trained by sultan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biom_albert_xxlarge_pipeline_en_5.5.0_3.0_1727220257398.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biom_albert_xxlarge_pipeline_en_5.5.0_3.0_1727220257398.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("biom_albert_xxlarge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("biom_albert_xxlarge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biom_albert_xxlarge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|771.3 MB| + +## References + +https://huggingface.co/sultan/BioM-ALBERT-xxlarge + +## Included Models + +- DocumentAssembler +- TokenizerModel +- AlbertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-biomedroberta_finetuned_valid_testing_en.md b/docs/_posts/ahmedlone127/2024-09-24-biomedroberta_finetuned_valid_testing_en.md new file mode 100644 index 00000000000000..00d20d3c3149ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-biomedroberta_finetuned_valid_testing_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English biomedroberta_finetuned_valid_testing RoBertaForTokenClassification from pabRomero +author: John Snow Labs +name: biomedroberta_finetuned_valid_testing +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biomedroberta_finetuned_valid_testing` is a English model originally trained by pabRomero. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biomedroberta_finetuned_valid_testing_en_5.5.0_3.0_1727199316956.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biomedroberta_finetuned_valid_testing_en_5.5.0_3.0_1727199316956.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("biomedroberta_finetuned_valid_testing","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("biomedroberta_finetuned_valid_testing", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biomedroberta_finetuned_valid_testing| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/pabRomero/BioMedRoBERTa-finetuned-valid-testing \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-biomedroberta_finetuned_valid_testing_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-biomedroberta_finetuned_valid_testing_pipeline_en.md new file mode 100644 index 00000000000000..6958fbe43e4c59 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-biomedroberta_finetuned_valid_testing_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English biomedroberta_finetuned_valid_testing_pipeline pipeline RoBertaForTokenClassification from pabRomero +author: John Snow Labs +name: biomedroberta_finetuned_valid_testing_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biomedroberta_finetuned_valid_testing_pipeline` is a English model originally trained by pabRomero. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biomedroberta_finetuned_valid_testing_pipeline_en_5.5.0_3.0_1727199340266.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biomedroberta_finetuned_valid_testing_pipeline_en_5.5.0_3.0_1727199340266.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("biomedroberta_finetuned_valid_testing_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("biomedroberta_finetuned_valid_testing_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biomedroberta_finetuned_valid_testing_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.3 MB| + +## References + +https://huggingface.co/pabRomero/BioMedRoBERTa-finetuned-valid-testing + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bpe_selfies_pubchem_shard00_166_5k_en.md b/docs/_posts/ahmedlone127/2024-09-24-bpe_selfies_pubchem_shard00_166_5k_en.md new file mode 100644 index 00000000000000..546b9280742de0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bpe_selfies_pubchem_shard00_166_5k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bpe_selfies_pubchem_shard00_166_5k RoBertaEmbeddings from seyonec +author: John Snow Labs +name: bpe_selfies_pubchem_shard00_166_5k +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bpe_selfies_pubchem_shard00_166_5k` is a English model originally trained by seyonec. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bpe_selfies_pubchem_shard00_166_5k_en_5.5.0_3.0_1727168680327.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bpe_selfies_pubchem_shard00_166_5k_en_5.5.0_3.0_1727168680327.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("bpe_selfies_pubchem_shard00_166_5k","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("bpe_selfies_pubchem_shard00_166_5k","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bpe_selfies_pubchem_shard00_166_5k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|309.3 MB| + +## References + +https://huggingface.co/seyonec/BPE_SELFIES_PubChem_shard00_166_5k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bpe_selfies_pubchem_shard00_166_5k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-bpe_selfies_pubchem_shard00_166_5k_pipeline_en.md new file mode 100644 index 00000000000000..3820b3a68e5ba6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bpe_selfies_pubchem_shard00_166_5k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bpe_selfies_pubchem_shard00_166_5k_pipeline pipeline RoBertaEmbeddings from seyonec +author: John Snow Labs +name: bpe_selfies_pubchem_shard00_166_5k_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bpe_selfies_pubchem_shard00_166_5k_pipeline` is a English model originally trained by seyonec. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bpe_selfies_pubchem_shard00_166_5k_pipeline_en_5.5.0_3.0_1727168695825.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bpe_selfies_pubchem_shard00_166_5k_pipeline_en_5.5.0_3.0_1727168695825.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bpe_selfies_pubchem_shard00_166_5k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bpe_selfies_pubchem_shard00_166_5k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bpe_selfies_pubchem_shard00_166_5k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.3 MB| + +## References + +https://huggingface.co/seyonec/BPE_SELFIES_PubChem_shard00_166_5k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-brwac_v1_5__checkpoint_27_100000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-brwac_v1_5__checkpoint_27_100000_pipeline_en.md new file mode 100644 index 00000000000000..5ff1b8538440c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-brwac_v1_5__checkpoint_27_100000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English brwac_v1_5__checkpoint_27_100000_pipeline pipeline RoBertaEmbeddings from eduagarcia-temp +author: John Snow Labs +name: brwac_v1_5__checkpoint_27_100000_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`brwac_v1_5__checkpoint_27_100000_pipeline` is a English model originally trained by eduagarcia-temp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/brwac_v1_5__checkpoint_27_100000_pipeline_en_5.5.0_3.0_1727169209114.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/brwac_v1_5__checkpoint_27_100000_pipeline_en_5.5.0_3.0_1727169209114.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("brwac_v1_5__checkpoint_27_100000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("brwac_v1_5__checkpoint_27_100000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|brwac_v1_5__checkpoint_27_100000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|296.9 MB| + +## References + +https://huggingface.co/eduagarcia-temp/brwac_v1_5__checkpoint_27_100000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_carmen_species_es.md b/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_carmen_species_es.md new file mode 100644 index 00000000000000..4015a36fbf1ef7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_carmen_species_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_carmen_species RoBertaForTokenClassification from BSC-NLP4BIA +author: John Snow Labs +name: bsc_bio_ehr_spanish_carmen_species +date: 2024-09-24 +tags: [es, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_carmen_species` is a Castilian, Spanish model originally trained by BSC-NLP4BIA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_carmen_species_es_5.5.0_3.0_1727151557808.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_carmen_species_es_5.5.0_3.0_1727151557808.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_carmen_species","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("bsc_bio_ehr_spanish_carmen_species", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_carmen_species| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|438.2 MB| + +## References + +https://huggingface.co/BSC-NLP4BIA/bsc-bio-ehr-es-carmen-species \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_carmen_species_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_carmen_species_pipeline_es.md new file mode 100644 index 00000000000000..b662f6f9a0b18f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_carmen_species_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_carmen_species_pipeline pipeline RoBertaForTokenClassification from BSC-NLP4BIA +author: John Snow Labs +name: bsc_bio_ehr_spanish_carmen_species_pipeline +date: 2024-09-24 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_carmen_species_pipeline` is a Castilian, Spanish model originally trained by BSC-NLP4BIA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_carmen_species_pipeline_es_5.5.0_3.0_1727151583876.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_carmen_species_pipeline_es_5.5.0_3.0_1727151583876.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_carmen_species_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_carmen_species_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_carmen_species_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|438.3 MB| + +## References + +https://huggingface.co/BSC-NLP4BIA/bsc-bio-ehr-es-carmen-species + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_symptemist_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_symptemist_pipeline_es.md new file mode 100644 index 00000000000000..b6fa6ca8663622 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-bsc_bio_ehr_spanish_symptemist_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bsc_bio_ehr_spanish_symptemist_pipeline pipeline RoBertaForTokenClassification from BSC-NLP4BIA +author: John Snow Labs +name: bsc_bio_ehr_spanish_symptemist_pipeline +date: 2024-09-24 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bsc_bio_ehr_spanish_symptemist_pipeline` is a Castilian, Spanish model originally trained by BSC-NLP4BIA. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_pipeline_es_5.5.0_3.0_1727151487083.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bsc_bio_ehr_spanish_symptemist_pipeline_es_5.5.0_3.0_1727151487083.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bsc_bio_ehr_spanish_symptemist_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bsc_bio_ehr_spanish_symptemist_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bsc_bio_ehr_spanish_symptemist_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|441.8 MB| + +## References + +https://huggingface.co/BSC-NLP4BIA/bsc-bio-ehr-es-symptemist + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_pipeline_en.md new file mode 100644 index 00000000000000..d3736fcdcddef1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_pipeline pipeline RoBertaEmbeddings from Erantr1 +author: John Snow Labs +name: burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_pipeline` is a English model originally trained by Erantr1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_pipeline_en_5.5.0_3.0_1727169097130.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_pipeline_en_5.5.0_3.0_1727169097130.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_eli5_mlm_model_eran_t_imdb_nepal_bhasa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/Erantr1/my_awesome_eli5_mlm_model_eran_t_imdb_new + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_anatg_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_anatg_en.md new file mode 100644 index 00000000000000..041e4659eb3582 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_anatg_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_anatg DistilBertForSequenceClassification from Anatg +author: John Snow Labs +name: burmese_awesome_model_anatg +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_anatg` is a English model originally trained by Anatg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_anatg_en_5.5.0_3.0_1727164377263.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_anatg_en_5.5.0_3.0_1727164377263.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_anatg","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_anatg", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_anatg| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Anatg/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_anatg_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_anatg_pipeline_en.md new file mode 100644 index 00000000000000..ee784e9ed8198d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_anatg_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_anatg_pipeline pipeline DistilBertForSequenceClassification from Anatg +author: John Snow Labs +name: burmese_awesome_model_anatg_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_anatg_pipeline` is a English model originally trained by Anatg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_anatg_pipeline_en_5.5.0_3.0_1727164395017.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_anatg_pipeline_en_5.5.0_3.0_1727164395017.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_anatg_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_anatg_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_anatg_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Anatg/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_hyungho_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_hyungho_pipeline_en.md new file mode 100644 index 00000000000000..5aa0867e73b5b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_hyungho_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_hyungho_pipeline pipeline DistilBertForSequenceClassification from Hyungho +author: John Snow Labs +name: burmese_awesome_model_hyungho_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_hyungho_pipeline` is a English model originally trained by Hyungho. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_hyungho_pipeline_en_5.5.0_3.0_1727136951846.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_hyungho_pipeline_en_5.5.0_3.0_1727136951846.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_hyungho_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_hyungho_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_hyungho_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Hyungho/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_sharadakatla_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_sharadakatla_pipeline_en.md new file mode 100644 index 00000000000000..2d76a30d11882c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_sharadakatla_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_sharadakatla_pipeline pipeline DistilBertForSequenceClassification from sharadakatla +author: John Snow Labs +name: burmese_awesome_model_sharadakatla_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_sharadakatla_pipeline` is a English model originally trained by sharadakatla. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_sharadakatla_pipeline_en_5.5.0_3.0_1727164839121.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_sharadakatla_pipeline_en_5.5.0_3.0_1727164839121.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_sharadakatla_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_sharadakatla_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_sharadakatla_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sharadakatla/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_thayinm_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_thayinm_en.md new file mode 100644 index 00000000000000..ca3f5323d02af4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_thayinm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_model_thayinm DistilBertForSequenceClassification from thayinm +author: John Snow Labs +name: burmese_awesome_model_thayinm +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_thayinm` is a English model originally trained by thayinm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_thayinm_en_5.5.0_3.0_1727164663718.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_thayinm_en_5.5.0_3.0_1727164663718.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_thayinm","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_model_thayinm", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_thayinm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thayinm/my_awesome_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_thayinm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_thayinm_pipeline_en.md new file mode 100644 index 00000000000000..facb0cfdca03d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_model_thayinm_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_model_thayinm_pipeline pipeline DistilBertForSequenceClassification from thayinm +author: John Snow Labs +name: burmese_awesome_model_thayinm_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_model_thayinm_pipeline` is a English model originally trained by thayinm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_thayinm_pipeline_en_5.5.0_3.0_1727164676551.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_model_thayinm_pipeline_en_5.5.0_3.0_1727164676551.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_model_thayinm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_model_thayinm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_model_thayinm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/thayinm/my_awesome_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_text_classification_jeruan3_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_text_classification_jeruan3_en.md new file mode 100644 index 00000000000000..c012723a30edd5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_awesome_text_classification_jeruan3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_text_classification_jeruan3 DistilBertForSequenceClassification from jeruan3 +author: John Snow Labs +name: burmese_awesome_text_classification_jeruan3 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_text_classification_jeruan3` is a English model originally trained by jeruan3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_text_classification_jeruan3_en_5.5.0_3.0_1727154509727.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_text_classification_jeruan3_en_5.5.0_3.0_1727154509727.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_text_classification_jeruan3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("burmese_awesome_text_classification_jeruan3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_text_classification_jeruan3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/jeruan3/my-awesome-text-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_bert_question_answering_model_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_bert_question_answering_model_en.md new file mode 100644 index 00000000000000..e73d7c0c9d58cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_bert_question_answering_model_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English burmese_bert_question_answering_model BertForQuestionAnswering from Ashkh0099 +author: John Snow Labs +name: burmese_bert_question_answering_model +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_bert_question_answering_model` is a English model originally trained by Ashkh0099. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_bert_question_answering_model_en_5.5.0_3.0_1727217061833.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_bert_question_answering_model_en_5.5.0_3.0_1727217061833.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("burmese_bert_question_answering_model","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("burmese_bert_question_answering_model", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_bert_question_answering_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Ashkh0099/my-bert-question-answering-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-burmese_bert_question_answering_model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-burmese_bert_question_answering_model_pipeline_en.md new file mode 100644 index 00000000000000..ea66055bf32c69 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-burmese_bert_question_answering_model_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English burmese_bert_question_answering_model_pipeline pipeline BertForQuestionAnswering from Ashkh0099 +author: John Snow Labs +name: burmese_bert_question_answering_model_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_bert_question_answering_model_pipeline` is a English model originally trained by Ashkh0099. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_bert_question_answering_model_pipeline_en_5.5.0_3.0_1727217085841.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_bert_question_answering_model_pipeline_en_5.5.0_3.0_1727217085841.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_bert_question_answering_model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_bert_question_answering_model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_bert_question_answering_model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Ashkh0099/my-bert-question-answering-model + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-busu_model_small_pipeline_ro.md b/docs/_posts/ahmedlone127/2024-09-24-busu_model_small_pipeline_ro.md new file mode 100644 index 00000000000000..a4fbadf2691ae9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-busu_model_small_pipeline_ro.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Moldavian, Moldovan, Romanian busu_model_small_pipeline pipeline WhisperForCTC from iulik-pisik +author: John Snow Labs +name: busu_model_small_pipeline +date: 2024-09-24 +tags: [ro, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ro +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`busu_model_small_pipeline` is a Moldavian, Moldovan, Romanian model originally trained by iulik-pisik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/busu_model_small_pipeline_ro_5.5.0_3.0_1727144357527.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/busu_model_small_pipeline_ro_5.5.0_3.0_1727144357527.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("busu_model_small_pipeline", lang = "ro") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("busu_model_small_pipeline", lang = "ro") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|busu_model_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ro| +|Size:|1.7 GB| + +## References + +https://huggingface.co/iulik-pisik/busu_model_small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-calc_nepal_bhasa_roberta_ep20_en.md b/docs/_posts/ahmedlone127/2024-09-24-calc_nepal_bhasa_roberta_ep20_en.md new file mode 100644 index 00000000000000..8d0265f04b03f2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-calc_nepal_bhasa_roberta_ep20_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English calc_nepal_bhasa_roberta_ep20 RoBertaForTokenClassification from vishruthnath +author: John Snow Labs +name: calc_nepal_bhasa_roberta_ep20 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`calc_nepal_bhasa_roberta_ep20` is a English model originally trained by vishruthnath. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/calc_nepal_bhasa_roberta_ep20_en_5.5.0_3.0_1727151021605.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/calc_nepal_bhasa_roberta_ep20_en_5.5.0_3.0_1727151021605.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("calc_nepal_bhasa_roberta_ep20","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("calc_nepal_bhasa_roberta_ep20", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|calc_nepal_bhasa_roberta_ep20| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|422.6 MB| + +## References + +https://huggingface.co/vishruthnath/Calc_new_RoBERTa_ep20 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-camembert_base_fr.md b/docs/_posts/ahmedlone127/2024-09-24-camembert_base_fr.md new file mode 100644 index 00000000000000..4fe642cf4b454e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-camembert_base_fr.md @@ -0,0 +1,86 @@ +--- +layout: model +title: CamemBERT Base Model +author: John Snow Labs +name: camembert_base +date: 2024-09-24 +tags: [fr, french, embeddings, camembert, base, open_source, onnx] +task: Embeddings +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: CamemBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +[CamemBERT](https://arxiv.org/abs/1911.03894) is a state-of-the-art language model for French based on the RoBERTa model. +For further information or requests, please go to [Camembert Website](https://camembert-model.fr/) + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/camembert_base_fr_5.5.0_3.0_1727210253431.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/camembert_base_fr_5.5.0_3.0_1727210253431.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +embeddings = CamemBertEmbeddings.pretrained("camembert_base", "fr") \ +.setInputCols("sentence", "token") \ +.setOutputCol("embeddings") +``` +```scala +val embeddings = CamemBertEmbeddings.pretrained("camembert_base", "fr") +.setInputCols("sentence", "token") +.setOutputCol("embeddings") +``` + +{:.nlu-block} +```python +import nlu +nlu.load("fr.embed.camembert_base").predict("""Put your text here.""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|camembert_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[camembert]| +|Language:|fr| +|Size:|264.0 MB| + +## Benchmarking + +```bash + + +| Model | #params | Arch. | Training data | +|--------------------------------|--------------------------------|-------|-----------------------------------| +| `camembert-base` | 110M | Base | OSCAR (138 GB of text) | +| `camembert/camembert-large` | 335M | Large | CCNet (135 GB of text) | +| `camembert/camembert-base-ccnet` | 110M | Base | CCNet (135 GB of text) | +| `camembert/camembert-base-wikipedia-4gb` | 110M | Base | Wikipedia (4 GB of text) | +| `camembert/camembert-base-oscar-4gb` | 110M | Base | Subsample of OSCAR (4 GB of text) | +| `camembert/camembert-base-ccnet-4gb` | 110M | Base | Subsample of CCNet (4 GB of text) | +``` \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-camembert_base_pipeline_fr.md b/docs/_posts/ahmedlone127/2024-09-24-camembert_base_pipeline_fr.md new file mode 100644 index 00000000000000..3e0a850cd2baab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-camembert_base_pipeline_fr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: French camembert_base_pipeline pipeline CamemBertEmbeddings from almanach +author: John Snow Labs +name: camembert_base_pipeline +date: 2024-09-24 +tags: [fr, open_source, pipeline, onnx] +task: Embeddings +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CamemBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`camembert_base_pipeline` is a French model originally trained by almanach. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/camembert_base_pipeline_fr_5.5.0_3.0_1727210328201.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/camembert_base_pipeline_fr_5.5.0_3.0_1727210328201.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("camembert_base_pipeline", lang = "fr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("camembert_base_pipeline", lang = "fr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|camembert_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fr| +|Size:|264.0 MB| + +## References + +https://huggingface.co/almanach/camembert-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- CamemBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-camembert_embeddings_Sonny_generic_model_fr.md b/docs/_posts/ahmedlone127/2024-09-24-camembert_embeddings_Sonny_generic_model_fr.md new file mode 100644 index 00000000000000..7fcd2cf9741eed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-camembert_embeddings_Sonny_generic_model_fr.md @@ -0,0 +1,98 @@ +--- +layout: model +title: French CamemBert Embeddings (from Sonny) +author: John Snow Labs +name: camembert_embeddings_Sonny_generic_model +date: 2024-09-24 +tags: [fr, open_source, camembert, embeddings, onnx] +task: Embeddings +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: CamemBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CamemBert Embeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `dummy-model` is a French model orginally trained by `Sonny`. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/camembert_embeddings_Sonny_generic_model_fr_5.5.0_3.0_1727210256163.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/camembert_embeddings_Sonny_generic_model_fr_5.5.0_3.0_1727210256163.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = CamemBertEmbeddings.pretrained("camembert_embeddings_Sonny_generic_model","fr") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, embeddings]) + +data = spark.createDataFrame([["J'adore Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = CamemBertEmbeddings.pretrained("camembert_embeddings_Sonny_generic_model","fr") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) + +val data = Seq("J'adore Spark NLP").toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|camembert_embeddings_Sonny_generic_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[camembert]| +|Language:|fr| +|Size:|264.0 MB| + +## References + +References + +- https://huggingface.co/Sonny/dummy-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-camembert_french_legal_fr.md b/docs/_posts/ahmedlone127/2024-09-24-camembert_french_legal_fr.md new file mode 100644 index 00000000000000..59f1c7f1560589 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-camembert_french_legal_fr.md @@ -0,0 +1,100 @@ +--- +layout: model +title: French Legal CamemBert Embeddings Model +author: John Snow Labs +name: camembert_french_legal +date: 2024-09-24 +tags: [open_source, camembert_embeddings, camembertformaskedlm, fr, onnx] +task: Embeddings +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: CamemBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CamemBertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `legal-camembert` is a French model originally trained by `maastrichtlawtech`. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/camembert_french_legal_fr_5.5.0_3.0_1727210205552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/camembert_french_legal_fr_5.5.0_3.0_1727210205552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = CamemBertEmbeddings.pretrained("camembert_french_legal","fr") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") \ + .setCaseSensitive(True) + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, embeddings]) + +data = spark.createDataFrame([["J'adore Spark NLP"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val embeddings = CamemBertEmbeddings.pretrained("camembert_french_legal","fr") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + .setCaseSensitive(True) + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) + +val data = Seq("J'adore Spark NLP").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|camembert_french_legal| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[camembert]| +|Language:|fr| +|Size:|412.8 MB| + +## References + +References + +https://huggingface.co/maastrichtlawtech/legal-camembert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-camembert_french_legal_pipeline_fr.md b/docs/_posts/ahmedlone127/2024-09-24-camembert_french_legal_pipeline_fr.md new file mode 100644 index 00000000000000..b0d219e318038d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-camembert_french_legal_pipeline_fr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: French camembert_french_legal_pipeline pipeline CamemBertEmbeddings from maastrichtlawtech +author: John Snow Labs +name: camembert_french_legal_pipeline +date: 2024-09-24 +tags: [fr, open_source, pipeline, onnx] +task: Embeddings +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CamemBertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`camembert_french_legal_pipeline` is a French model originally trained by maastrichtlawtech. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/camembert_french_legal_pipeline_fr_5.5.0_3.0_1727210226725.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/camembert_french_legal_pipeline_fr_5.5.0_3.0_1727210226725.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("camembert_french_legal_pipeline", lang = "fr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("camembert_french_legal_pipeline", lang = "fr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|camembert_french_legal_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fr| +|Size:|412.9 MB| + +## References + +https://huggingface.co/maastrichtlawtech/legal-camembert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- CamemBertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-chinese_roberta_wwm_ext_large_finetuned_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-chinese_roberta_wwm_ext_large_finetuned_ner_pipeline_en.md new file mode 100644 index 00000000000000..1190dec77e5b3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-chinese_roberta_wwm_ext_large_finetuned_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English chinese_roberta_wwm_ext_large_finetuned_ner_pipeline pipeline BertForTokenClassification from HYM +author: John Snow Labs +name: chinese_roberta_wwm_ext_large_finetuned_ner_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chinese_roberta_wwm_ext_large_finetuned_ner_pipeline` is a English model originally trained by HYM. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chinese_roberta_wwm_ext_large_finetuned_ner_pipeline_en_5.5.0_3.0_1727203754472.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chinese_roberta_wwm_ext_large_finetuned_ner_pipeline_en_5.5.0_3.0_1727203754472.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("chinese_roberta_wwm_ext_large_finetuned_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("chinese_roberta_wwm_ext_large_finetuned_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chinese_roberta_wwm_ext_large_finetuned_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/HYM/chinese-roberta-wwm-ext-large-finetuned-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-cl_arabertv0_1_base_33379_arabic_tydiqa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-cl_arabertv0_1_base_33379_arabic_tydiqa_pipeline_en.md new file mode 100644 index 00000000000000..b05e372ff5ac3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-cl_arabertv0_1_base_33379_arabic_tydiqa_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English cl_arabertv0_1_base_33379_arabic_tydiqa_pipeline pipeline BertForQuestionAnswering from MatMulMan +author: John Snow Labs +name: cl_arabertv0_1_base_33379_arabic_tydiqa_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cl_arabertv0_1_base_33379_arabic_tydiqa_pipeline` is a English model originally trained by MatMulMan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cl_arabertv0_1_base_33379_arabic_tydiqa_pipeline_en_5.5.0_3.0_1727216854587.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cl_arabertv0_1_base_33379_arabic_tydiqa_pipeline_en_5.5.0_3.0_1727216854587.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cl_arabertv0_1_base_33379_arabic_tydiqa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cl_arabertv0_1_base_33379_arabic_tydiqa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cl_arabertv0_1_base_33379_arabic_tydiqa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|505.0 MB| + +## References + +https://huggingface.co/MatMulMan/CL-AraBERTv0.1-base-33379-arabic_tydiqa + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-classification_multi_label_des_crimes_ar.md b/docs/_posts/ahmedlone127/2024-09-24-classification_multi_label_des_crimes_ar.md new file mode 100644 index 00000000000000..4a20fa29fb59c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-classification_multi_label_des_crimes_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic classification_multi_label_des_crimes BertForSequenceClassification from fatttty +author: John Snow Labs +name: classification_multi_label_des_crimes +date: 2024-09-24 +tags: [ar, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classification_multi_label_des_crimes` is a Arabic model originally trained by fatttty. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classification_multi_label_des_crimes_ar_5.5.0_3.0_1727222144533.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classification_multi_label_des_crimes_ar_5.5.0_3.0_1727222144533.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("classification_multi_label_des_crimes","ar") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("classification_multi_label_des_crimes", "ar") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classification_multi_label_des_crimes| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|ar| +|Size:|508.7 MB| + +## References + +https://huggingface.co/fatttty/classification_multi_label_des_crimes \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-classification_multi_label_des_crimes_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-24-classification_multi_label_des_crimes_pipeline_ar.md new file mode 100644 index 00000000000000..373bc6fae4add6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-classification_multi_label_des_crimes_pipeline_ar.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Arabic classification_multi_label_des_crimes_pipeline pipeline BertForSequenceClassification from fatttty +author: John Snow Labs +name: classification_multi_label_des_crimes_pipeline +date: 2024-09-24 +tags: [ar, open_source, pipeline, onnx] +task: Text Classification +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classification_multi_label_des_crimes_pipeline` is a Arabic model originally trained by fatttty. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classification_multi_label_des_crimes_pipeline_ar_5.5.0_3.0_1727222170898.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classification_multi_label_des_crimes_pipeline_ar_5.5.0_3.0_1727222170898.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classification_multi_label_des_crimes_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classification_multi_label_des_crimes_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classification_multi_label_des_crimes_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|508.7 MB| + +## References + +https://huggingface.co/fatttty/classification_multi_label_des_crimes + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-classification_tagging_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-classification_tagging_pipeline_en.md new file mode 100644 index 00000000000000..feeb49c38ca78c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-classification_tagging_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English classification_tagging_pipeline pipeline BertEmbeddings from kumarsonu +author: John Snow Labs +name: classification_tagging_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classification_tagging_pipeline` is a English model originally trained by kumarsonu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classification_tagging_pipeline_en_5.5.0_3.0_1727177692643.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classification_tagging_pipeline_en_5.5.0_3.0_1727177692643.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classification_tagging_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classification_tagging_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classification_tagging_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/kumarsonu/Classification_Tagging + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-clip_vit_large_patch14_en.md b/docs/_posts/ahmedlone127/2024-09-24-clip_vit_large_patch14_en.md new file mode 100644 index 00000000000000..a9e0a48916c7e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-clip_vit_large_patch14_en.md @@ -0,0 +1,120 @@ +--- +layout: model +title: English clip_vit_large_patch14 CLIPForZeroShotClassification from openai +author: John Snow Labs +name: clip_vit_large_patch14 +date: 2024-09-24 +tags: [en, open_source, onnx, zero_shot, clip, image] +task: Zero-Shot Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: CLIPForZeroShotClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CLIPForZeroShotClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clip_vit_large_patch14` is a English model originally trained by openai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clip_vit_large_patch14_en_5.5.0_3.0_1727207942979.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clip_vit_large_patch14_en_5.5.0_3.0_1727207942979.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +imageDF = spark.read \ + .format("image") \ + .option("dropInvalid", value = True) \ + .load("src/test/resources/image/") + +candidateLabels = [ + "a photo of a bird", + "a photo of a cat", + "a photo of a dog", + "a photo of a hen", + "a photo of a hippo", + "a photo of a room", + "a photo of a tractor", + "a photo of an ostrich", + "a photo of an ox"] + +ImageAssembler = ImageAssembler() \ + .setInputCol("image") \ + .setOutputCol("image_assembler") + +imageClassifier = CLIPForZeroShotClassification.pretrained("clip_vit_large_patch14","en") \ + .setInputCols(["image_assembler"]) \ + .setOutputCol("label") \ + .setCandidateLabels(candidateLabels) + +pipeline = Pipeline().setStages([ImageAssembler, imageClassifier]) +pipelineModel = pipeline.fit(imageDF) +pipelineDF = pipelineModel.transform(imageDF) + + +``` +```scala + + +val imageDF = ResourceHelper.spark.read + .format("image") + .option("dropInvalid", value = true) + .load("src/test/resources/image/") + +val candidateLabels = Array( + "a photo of a bird", + "a photo of a cat", + "a photo of a dog", + "a photo of a hen", + "a photo of a hippo", + "a photo of a room", + "a photo of a tractor", + "a photo of an ostrich", + "a photo of an ox") + +val imageAssembler = new ImageAssembler() + .setInputCol("image") + .setOutputCol("image_assembler") + +val imageClassifier = CLIPForZeroShotClassification.pretrained("clip_vit_large_patch14","en") \ + .setInputCols(Array("image_assembler")) \ + .setOutputCol("label") \ + .setCandidateLabels(candidateLabels) + +val pipeline = new Pipeline().setStages(Array(imageAssembler, imageClassifier)) +val pipelineModel = pipeline.fit(imageDF) +val pipelineDF = pipelineModel.transform(imageDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clip_vit_large_patch14| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[image_assembler]| +|Output Labels:|[label]| +|Language:|en| +|Size:|1.1 GB| + +## References + +https://huggingface.co/openai/clip-vit-large-patch14 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_random_trimmed_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_random_trimmed_pipeline_en.md new file mode 100644 index 00000000000000..4be6be30b58b60 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_random_trimmed_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English code_search_codebert_base_random_trimmed_pipeline pipeline RoBertaForTokenClassification from DianaIulia +author: John Snow Labs +name: code_search_codebert_base_random_trimmed_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`code_search_codebert_base_random_trimmed_pipeline` is a English model originally trained by DianaIulia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_random_trimmed_pipeline_en_5.5.0_3.0_1727150992459.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_random_trimmed_pipeline_en_5.5.0_3.0_1727150992459.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("code_search_codebert_base_random_trimmed_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("code_search_codebert_base_random_trimmed_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|code_search_codebert_base_random_trimmed_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.2 MB| + +## References + +https://huggingface.co/DianaIulia/code_search_codebert_base_random_trimmed + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_up_down_1_trimmed_en.md b/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_up_down_1_trimmed_en.md new file mode 100644 index 00000000000000..47eed14df08dbd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-code_search_codebert_base_up_down_1_trimmed_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English code_search_codebert_base_up_down_1_trimmed RoBertaForTokenClassification from DianaIulia +author: John Snow Labs +name: code_search_codebert_base_up_down_1_trimmed +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`code_search_codebert_base_up_down_1_trimmed` is a English model originally trained by DianaIulia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_up_down_1_trimmed_en_5.5.0_3.0_1727139369172.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/code_search_codebert_base_up_down_1_trimmed_en_5.5.0_3.0_1727139369172.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("code_search_codebert_base_up_down_1_trimmed","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("code_search_codebert_base_up_down_1_trimmed", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|code_search_codebert_base_up_down_1_trimmed| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/DianaIulia/code_search_codebert_base_up_down_1_trimmed \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-codebert_java_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-codebert_java_pipeline_en.md new file mode 100644 index 00000000000000..fd5c683ee5f943 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-codebert_java_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English codebert_java_pipeline pipeline RoBertaEmbeddings from neulab +author: John Snow Labs +name: codebert_java_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`codebert_java_pipeline` is a English model originally trained by neulab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/codebert_java_pipeline_en_5.5.0_3.0_1727216315272.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/codebert_java_pipeline_en_5.5.0_3.0_1727216315272.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("codebert_java_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("codebert_java_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|codebert_java_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.0 MB| + +## References + +https://huggingface.co/neulab/codebert-java + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-conflibert_cont_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-conflibert_cont_cased_pipeline_en.md new file mode 100644 index 00000000000000..c0acf6f8db2f25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-conflibert_cont_cased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English conflibert_cont_cased_pipeline pipeline BertEmbeddings from snowood1 +author: John Snow Labs +name: conflibert_cont_cased_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`conflibert_cont_cased_pipeline` is a English model originally trained by snowood1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/conflibert_cont_cased_pipeline_en_5.5.0_3.0_1727220709179.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/conflibert_cont_cased_pipeline_en_5.5.0_3.0_1727220709179.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("conflibert_cont_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("conflibert_cont_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|conflibert_cont_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|402.9 MB| + +## References + +https://huggingface.co/snowood1/ConfliBERT-cont-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-conflibert_cont_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-24-conflibert_cont_uncased_en.md new file mode 100644 index 00000000000000..a038e41e4bf7aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-conflibert_cont_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English conflibert_cont_uncased BertEmbeddings from snowood1 +author: John Snow Labs +name: conflibert_cont_uncased +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`conflibert_cont_uncased` is a English model originally trained by snowood1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/conflibert_cont_uncased_en_5.5.0_3.0_1727221175110.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/conflibert_cont_uncased_en_5.5.0_3.0_1727221175110.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("conflibert_cont_uncased","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("conflibert_cont_uncased","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|conflibert_cont_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/snowood1/ConfliBERT-cont-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-conflibert_cont_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-conflibert_cont_uncased_pipeline_en.md new file mode 100644 index 00000000000000..3760c07f3d59e2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-conflibert_cont_uncased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English conflibert_cont_uncased_pipeline pipeline BertEmbeddings from snowood1 +author: John Snow Labs +name: conflibert_cont_uncased_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`conflibert_cont_uncased_pipeline` is a English model originally trained by snowood1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/conflibert_cont_uncased_pipeline_en_5.5.0_3.0_1727221195692.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/conflibert_cont_uncased_pipeline_en_5.5.0_3.0_1727221195692.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("conflibert_cont_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("conflibert_cont_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|conflibert_cont_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/snowood1/ConfliBERT-cont-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-contaminationquestionanswering_en.md b/docs/_posts/ahmedlone127/2024-09-24-contaminationquestionanswering_en.md new file mode 100644 index 00000000000000..ed87debf5d4a0d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-contaminationquestionanswering_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English contaminationquestionanswering DistilBertForQuestionAnswering from Shushant +author: John Snow Labs +name: contaminationquestionanswering +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, distilbert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`contaminationquestionanswering` is a English model originally trained by Shushant. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/contaminationquestionanswering_en_5.5.0_3.0_1727219904234.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/contaminationquestionanswering_en_5.5.0_3.0_1727219904234.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DistilBertForQuestionAnswering.pretrained("contaminationquestionanswering","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DistilBertForQuestionAnswering.pretrained("contaminationquestionanswering", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|contaminationquestionanswering| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|247.2 MB| + +## References + +https://huggingface.co/Shushant/ContaminationQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-contaminationquestionanswering_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-contaminationquestionanswering_pipeline_en.md new file mode 100644 index 00000000000000..304bf39e316815 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-contaminationquestionanswering_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English contaminationquestionanswering_pipeline pipeline DistilBertForQuestionAnswering from Shushant +author: John Snow Labs +name: contaminationquestionanswering_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`contaminationquestionanswering_pipeline` is a English model originally trained by Shushant. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/contaminationquestionanswering_pipeline_en_5.5.0_3.0_1727219918030.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/contaminationquestionanswering_pipeline_en_5.5.0_3.0_1727219918030.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("contaminationquestionanswering_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("contaminationquestionanswering_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|contaminationquestionanswering_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Shushant/ContaminationQuestionAnswering + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_en.md b/docs/_posts/ahmedlone127/2024-09-24-correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_en.md new file mode 100644 index 00000000000000..f6b7659db77131 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14 BertForTokenClassification from ali2066 +author: John Snow Labs +name: correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_en_5.5.0_3.0_1727203282167.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_en_5.5.0_3.0_1727203282167.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ali2066/correct_BERT_token_itr0_0.0001_webDiscourse_01_03_2022-15_47_14 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_pipeline_en.md new file mode 100644 index 00000000000000..2afb6fc03d145b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_pipeline pipeline BertForTokenClassification from ali2066 +author: John Snow Labs +name: correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_pipeline` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_pipeline_en_5.5.0_3.0_1727203304268.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_pipeline_en_5.5.0_3.0_1727203304268.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|correct_bert_token_itr0_0_0001_webdiscourse_01_03_2022_15_47_14_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ali2066/correct_BERT_token_itr0_0.0001_webDiscourse_01_03_2022-15_47_14 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-danish_roberta_portuguese_en.md b/docs/_posts/ahmedlone127/2024-09-24-danish_roberta_portuguese_en.md new file mode 100644 index 00000000000000..bda9ea5ab7c417 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-danish_roberta_portuguese_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English danish_roberta_portuguese RoBertaForSequenceClassification from mediabiasgroup +author: John Snow Labs +name: danish_roberta_portuguese +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`danish_roberta_portuguese` is a English model originally trained by mediabiasgroup. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/danish_roberta_portuguese_en_5.5.0_3.0_1727211564054.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/danish_roberta_portuguese_en_5.5.0_3.0_1727211564054.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("danish_roberta_portuguese","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("danish_roberta_portuguese", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|danish_roberta_portuguese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|444.4 MB| + +## References + +https://huggingface.co/mediabiasgroup/da-roberta-pt \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-danish_roberta_portuguese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-danish_roberta_portuguese_pipeline_en.md new file mode 100644 index 00000000000000..79643dcad33d72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-danish_roberta_portuguese_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English danish_roberta_portuguese_pipeline pipeline RoBertaForSequenceClassification from mediabiasgroup +author: John Snow Labs +name: danish_roberta_portuguese_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`danish_roberta_portuguese_pipeline` is a English model originally trained by mediabiasgroup. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/danish_roberta_portuguese_pipeline_en_5.5.0_3.0_1727211599011.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/danish_roberta_portuguese_pipeline_en_5.5.0_3.0_1727211599011.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("danish_roberta_portuguese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("danish_roberta_portuguese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|danish_roberta_portuguese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|444.5 MB| + +## References + +https://huggingface.co/mediabiasgroup/da-roberta-pt + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-db_mc2_4_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-db_mc2_4_1_pipeline_en.md new file mode 100644 index 00000000000000..fa9e08d6ff9e35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-db_mc2_4_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English db_mc2_4_1_pipeline pipeline DistilBertForSequenceClassification from exala +author: John Snow Labs +name: db_mc2_4_1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`db_mc2_4_1_pipeline` is a English model originally trained by exala. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/db_mc2_4_1_pipeline_en_5.5.0_3.0_1727137585316.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/db_mc2_4_1_pipeline_en_5.5.0_3.0_1727137585316.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("db_mc2_4_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("db_mc2_4_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|db_mc2_4_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/exala/db_mc2_4.1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_base_japanese_ku_nlp_ja.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_base_japanese_ku_nlp_ja.md new file mode 100644 index 00000000000000..9b4d0e016a35a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_base_japanese_ku_nlp_ja.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Japanese deberta_v2_base_japanese_ku_nlp DeBertaEmbeddings from ku-nlp +author: John Snow Labs +name: deberta_v2_base_japanese_ku_nlp +date: 2024-09-24 +tags: [ja, open_source, onnx, embeddings, deberta] +task: Embeddings +language: ja +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v2_base_japanese_ku_nlp` is a Japanese model originally trained by ku-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v2_base_japanese_ku_nlp_ja_5.5.0_3.0_1727196997773.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v2_base_japanese_ku_nlp_ja_5.5.0_3.0_1727196997773.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DeBertaEmbeddings.pretrained("deberta_v2_base_japanese_ku_nlp","ja") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = DeBertaEmbeddings.pretrained("deberta_v2_base_japanese_ku_nlp","ja") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v2_base_japanese_ku_nlp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence, token]| +|Output Labels:|[deberta]| +|Language:|ja| +|Size:|419.0 MB| + +## References + +https://huggingface.co/ku-nlp/deberta-v2-base-japanese \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_base_japanese_ku_nlp_pipeline_ja.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_base_japanese_ku_nlp_pipeline_ja.md new file mode 100644 index 00000000000000..3ead0421a9f772 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_base_japanese_ku_nlp_pipeline_ja.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Japanese deberta_v2_base_japanese_ku_nlp_pipeline pipeline DeBertaEmbeddings from ku-nlp +author: John Snow Labs +name: deberta_v2_base_japanese_ku_nlp_pipeline +date: 2024-09-24 +tags: [ja, open_source, pipeline, onnx] +task: Embeddings +language: ja +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v2_base_japanese_ku_nlp_pipeline` is a Japanese model originally trained by ku-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v2_base_japanese_ku_nlp_pipeline_ja_5.5.0_3.0_1727197018414.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v2_base_japanese_ku_nlp_pipeline_ja_5.5.0_3.0_1727197018414.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deberta_v2_base_japanese_ku_nlp_pipeline", lang = "ja") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deberta_v2_base_japanese_ku_nlp_pipeline", lang = "ja") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v2_base_japanese_ku_nlp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ja| +|Size:|419.0 MB| + +## References + +https://huggingface.co/ku-nlp/deberta-v2-base-japanese + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_large_japanese_ja.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_large_japanese_ja.md new file mode 100644 index 00000000000000..a74e558350cd1a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_large_japanese_ja.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Japanese deberta_v2_large_japanese DeBertaEmbeddings from ku-nlp +author: John Snow Labs +name: deberta_v2_large_japanese +date: 2024-09-24 +tags: [ja, open_source, onnx, embeddings, deberta] +task: Embeddings +language: ja +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v2_large_japanese` is a Japanese model originally trained by ku-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v2_large_japanese_ja_5.5.0_3.0_1727197101334.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v2_large_japanese_ja_5.5.0_3.0_1727197101334.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DeBertaEmbeddings.pretrained("deberta_v2_large_japanese","ja") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = DeBertaEmbeddings.pretrained("deberta_v2_large_japanese","ja") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v2_large_japanese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence, token]| +|Output Labels:|[deberta]| +|Language:|ja| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ku-nlp/deberta-v2-large-japanese \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_large_japanese_pipeline_ja.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_large_japanese_pipeline_ja.md new file mode 100644 index 00000000000000..8898d22e16cab0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v2_large_japanese_pipeline_ja.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Japanese deberta_v2_large_japanese_pipeline pipeline DeBertaEmbeddings from ku-nlp +author: John Snow Labs +name: deberta_v2_large_japanese_pipeline +date: 2024-09-24 +tags: [ja, open_source, pipeline, onnx] +task: Embeddings +language: ja +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v2_large_japanese_pipeline` is a Japanese model originally trained by ku-nlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v2_large_japanese_pipeline_ja_5.5.0_3.0_1727197163862.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v2_large_japanese_pipeline_ja_5.5.0_3.0_1727197163862.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deberta_v2_large_japanese_pipeline", lang = "ja") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deberta_v2_large_japanese_pipeline", lang = "ja") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v2_large_japanese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ja| +|Size:|1.3 GB| + +## References + +https://huggingface.co/ku-nlp/deberta-v2-large-japanese + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_base_prompt_injection_protectai_en.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_base_prompt_injection_protectai_en.md new file mode 100644 index 00000000000000..f83622f8c1c10f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_base_prompt_injection_protectai_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deberta_v3_base_prompt_injection_protectai DeBertaForSequenceClassification from protectai +author: John Snow Labs +name: deberta_v3_base_prompt_injection_protectai +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_base_prompt_injection_protectai` is a English model originally trained by protectai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_base_prompt_injection_protectai_en_5.5.0_3.0_1727212657514.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_base_prompt_injection_protectai_en_5.5.0_3.0_1727212657514.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_base_prompt_injection_protectai","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_base_prompt_injection_protectai", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_base_prompt_injection_protectai| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|677.6 MB| + +## References + +https://huggingface.co/protectai/deberta-v3-base-prompt-injection \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_base_prompt_injection_protectai_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_base_prompt_injection_protectai_pipeline_en.md new file mode 100644 index 00000000000000..ef3676ca704fb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_base_prompt_injection_protectai_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English deberta_v3_base_prompt_injection_protectai_pipeline pipeline DeBertaForSequenceClassification from protectai +author: John Snow Labs +name: deberta_v3_base_prompt_injection_protectai_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_base_prompt_injection_protectai_pipeline` is a English model originally trained by protectai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_base_prompt_injection_protectai_pipeline_en_5.5.0_3.0_1727212696688.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_base_prompt_injection_protectai_pipeline_en_5.5.0_3.0_1727212696688.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deberta_v3_base_prompt_injection_protectai_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deberta_v3_base_prompt_injection_protectai_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_base_prompt_injection_protectai_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|677.6 MB| + +## References + +https://huggingface.co/protectai/deberta-v3-base-prompt-injection + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_large_hf_weights_en.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_large_hf_weights_en.md new file mode 100644 index 00000000000000..a77c95fd9b6ba7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_large_hf_weights_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deberta_v3_large_hf_weights DeBertaEmbeddings from nagupv +author: John Snow Labs +name: deberta_v3_large_hf_weights +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, deberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_large_hf_weights` is a English model originally trained by nagupv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_large_hf_weights_en_5.5.0_3.0_1727197193591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_large_hf_weights_en_5.5.0_3.0_1727197193591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = DeBertaEmbeddings.pretrained("deberta_v3_large_hf_weights","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = DeBertaEmbeddings.pretrained("deberta_v3_large_hf_weights","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_large_hf_weights| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence, token]| +|Output Labels:|[deberta]| +|Language:|en| +|Size:|1.6 GB| + +## References + +https://huggingface.co/nagupv/deberta-v3-large-hf-weights \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_large_hf_weights_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_large_hf_weights_pipeline_en.md new file mode 100644 index 00000000000000..eee146bb569ea3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_large_hf_weights_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English deberta_v3_large_hf_weights_pipeline pipeline DeBertaEmbeddings from nagupv +author: John Snow Labs +name: deberta_v3_large_hf_weights_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_large_hf_weights_pipeline` is a English model originally trained by nagupv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_large_hf_weights_pipeline_en_5.5.0_3.0_1727197274114.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_large_hf_weights_pipeline_en_5.5.0_3.0_1727197274114.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deberta_v3_large_hf_weights_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deberta_v3_large_hf_weights_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_large_hf_weights_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.6 GB| + +## References + +https://huggingface.co/nagupv/deberta-v3-large-hf-weights + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_small_finetuned_squad_en.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_small_finetuned_squad_en.md new file mode 100644 index 00000000000000..21279ac15a8d21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_small_finetuned_squad_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English deberta_v3_small_finetuned_squad DeBertaForQuestionAnswering from mrm8488 +author: John Snow Labs +name: deberta_v3_small_finetuned_squad +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, deberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_small_finetuned_squad` is a English model originally trained by mrm8488. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_small_finetuned_squad_en_5.5.0_3.0_1727215551405.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_small_finetuned_squad_en_5.5.0_3.0_1727215551405.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = DeBertaForQuestionAnswering.pretrained("deberta_v3_small_finetuned_squad","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = DeBertaForQuestionAnswering.pretrained("deberta_v3_small_finetuned_squad", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_small_finetuned_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|484.5 MB| + +## References + +https://huggingface.co/mrm8488/deberta-v3-small-finetuned-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_small_finetuned_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_small_finetuned_squad_pipeline_en.md new file mode 100644 index 00000000000000..8b2d409674c310 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_small_finetuned_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English deberta_v3_small_finetuned_squad_pipeline pipeline DeBertaForQuestionAnswering from mrm8488 +author: John Snow Labs +name: deberta_v3_small_finetuned_squad_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_small_finetuned_squad_pipeline` is a English model originally trained by mrm8488. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_small_finetuned_squad_pipeline_en_5.5.0_3.0_1727215592183.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_small_finetuned_squad_pipeline_en_5.5.0_3.0_1727215592183.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deberta_v3_small_finetuned_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deberta_v3_small_finetuned_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_small_finetuned_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|484.5 MB| + +## References + +https://huggingface.co/mrm8488/deberta-v3-small-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- DeBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_xsmall_stsb_en.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_xsmall_stsb_en.md new file mode 100644 index 00000000000000..5892c87324efdd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_xsmall_stsb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English deberta_v3_xsmall_stsb DeBertaForSequenceClassification from cliang1453 +author: John Snow Labs +name: deberta_v3_xsmall_stsb +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, deberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DeBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_xsmall_stsb` is a English model originally trained by cliang1453. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_xsmall_stsb_en_5.5.0_3.0_1727212651337.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_xsmall_stsb_en_5.5.0_3.0_1727212651337.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_xsmall_stsb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DeBertaForSequenceClassification.pretrained("deberta_v3_xsmall_stsb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_xsmall_stsb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|207.8 MB| + +## References + +https://huggingface.co/cliang1453/deberta-v3-xsmall-stsb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_xsmall_stsb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_xsmall_stsb_pipeline_en.md new file mode 100644 index 00000000000000..21e86b1d261e5c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deberta_v3_xsmall_stsb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English deberta_v3_xsmall_stsb_pipeline pipeline DeBertaForSequenceClassification from cliang1453 +author: John Snow Labs +name: deberta_v3_xsmall_stsb_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deberta_v3_xsmall_stsb_pipeline` is a English model originally trained by cliang1453. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deberta_v3_xsmall_stsb_pipeline_en_5.5.0_3.0_1727212684677.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deberta_v3_xsmall_stsb_pipeline_en_5.5.0_3.0_1727212684677.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("deberta_v3_xsmall_stsb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("deberta_v3_xsmall_stsb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deberta_v3_xsmall_stsb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|207.9 MB| + +## References + +https://huggingface.co/cliang1453/deberta-v3-xsmall-stsb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-deepset_bert_base_cased_squad2_orkg_what_5e_05_en.md b/docs/_posts/ahmedlone127/2024-09-24-deepset_bert_base_cased_squad2_orkg_what_5e_05_en.md new file mode 100644 index 00000000000000..1162738474146a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-deepset_bert_base_cased_squad2_orkg_what_5e_05_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English deepset_bert_base_cased_squad2_orkg_what_5e_05 BertForQuestionAnswering from Moussab +author: John Snow Labs +name: deepset_bert_base_cased_squad2_orkg_what_5e_05 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`deepset_bert_base_cased_squad2_orkg_what_5e_05` is a English model originally trained by Moussab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/deepset_bert_base_cased_squad2_orkg_what_5e_05_en_5.5.0_3.0_1727176079756.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/deepset_bert_base_cased_squad2_orkg_what_5e_05_en_5.5.0_3.0_1727176079756.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("deepset_bert_base_cased_squad2_orkg_what_5e_05","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("deepset_bert_base_cased_squad2_orkg_what_5e_05", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|deepset_bert_base_cased_squad2_orkg_what_5e_05| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/Moussab/deepset_bert-base-cased-squad2-orkg-what-5e-05 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-dictabert_large_he.md b/docs/_posts/ahmedlone127/2024-09-24-dictabert_large_he.md new file mode 100644 index 00000000000000..4a81e96f5a3ec5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-dictabert_large_he.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Hebrew dictabert_large BertEmbeddings from dicta-il +author: John Snow Labs +name: dictabert_large +date: 2024-09-24 +tags: [he, open_source, onnx, embeddings, bert] +task: Embeddings +language: he +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dictabert_large` is a Hebrew model originally trained by dicta-il. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dictabert_large_he_5.5.0_3.0_1727174099880.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dictabert_large_he_5.5.0_3.0_1727174099880.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("dictabert_large","he") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("dictabert_large","he") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dictabert_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|he| +|Size:|1.0 GB| + +## References + +https://huggingface.co/dicta-il/dictabert-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-did_the_doctor_call_italian_a_specialty_bert_first512_en.md b/docs/_posts/ahmedlone127/2024-09-24-did_the_doctor_call_italian_a_specialty_bert_first512_en.md new file mode 100644 index 00000000000000..75413b1c33db40 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-did_the_doctor_call_italian_a_specialty_bert_first512_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English did_the_doctor_call_italian_a_specialty_bert_first512 BertForSequenceClassification from etadevosyan +author: John Snow Labs +name: did_the_doctor_call_italian_a_specialty_bert_first512 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`did_the_doctor_call_italian_a_specialty_bert_first512` is a English model originally trained by etadevosyan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/did_the_doctor_call_italian_a_specialty_bert_first512_en_5.5.0_3.0_1727222235067.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/did_the_doctor_call_italian_a_specialty_bert_first512_en_5.5.0_3.0_1727222235067.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("did_the_doctor_call_italian_a_specialty_bert_first512","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("did_the_doctor_call_italian_a_specialty_bert_first512", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|did_the_doctor_call_italian_a_specialty_bert_first512| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|666.5 MB| + +## References + +https://huggingface.co/etadevosyan/did_the_doctor_call_it_a_specialty_bert_First512 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-did_the_doctor_call_italian_a_specialty_bert_first512_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-did_the_doctor_call_italian_a_specialty_bert_first512_pipeline_en.md new file mode 100644 index 00000000000000..0604281d719c4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-did_the_doctor_call_italian_a_specialty_bert_first512_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English did_the_doctor_call_italian_a_specialty_bert_first512_pipeline pipeline BertForSequenceClassification from etadevosyan +author: John Snow Labs +name: did_the_doctor_call_italian_a_specialty_bert_first512_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`did_the_doctor_call_italian_a_specialty_bert_first512_pipeline` is a English model originally trained by etadevosyan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/did_the_doctor_call_italian_a_specialty_bert_first512_pipeline_en_5.5.0_3.0_1727222269094.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/did_the_doctor_call_italian_a_specialty_bert_first512_pipeline_en_5.5.0_3.0_1727222269094.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("did_the_doctor_call_italian_a_specialty_bert_first512_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("did_the_doctor_call_italian_a_specialty_bert_first512_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|did_the_doctor_call_italian_a_specialty_bert_first512_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|666.5 MB| + +## References + +https://huggingface.co/etadevosyan/did_the_doctor_call_it_a_specialty_bert_First512 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-disease_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-24-disease_classifier_en.md new file mode 100644 index 00000000000000..22d6b0a20ea9c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-disease_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English disease_classifier DistilBertForSequenceClassification from Amirth24 +author: John Snow Labs +name: disease_classifier +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`disease_classifier` is a English model originally trained by Amirth24. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/disease_classifier_en_5.5.0_3.0_1727204902517.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/disease_classifier_en_5.5.0_3.0_1727204902517.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("disease_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("disease_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|disease_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|252.6 MB| + +## References + +https://huggingface.co/Amirth24/disease_classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-dissertation_bert_en.md b/docs/_posts/ahmedlone127/2024-09-24-dissertation_bert_en.md new file mode 100644 index 00000000000000..b5b21abdf24db8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-dissertation_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dissertation_bert BertForSequenceClassification from ohid19 +author: John Snow Labs +name: dissertation_bert +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dissertation_bert` is a English model originally trained by ohid19. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dissertation_bert_en_5.5.0_3.0_1727213690638.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dissertation_bert_en_5.5.0_3.0_1727213690638.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("dissertation_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("dissertation_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dissertation_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ohid19/dissertation_bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-dissertation_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-dissertation_bert_pipeline_en.md new file mode 100644 index 00000000000000..a9a14441db95ed --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-dissertation_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dissertation_bert_pipeline pipeline BertForSequenceClassification from ohid19 +author: John Snow Labs +name: dissertation_bert_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dissertation_bert_pipeline` is a English model originally trained by ohid19. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dissertation_bert_pipeline_en_5.5.0_3.0_1727213711964.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dissertation_bert_pipeline_en_5.5.0_3.0_1727213711964.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dissertation_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dissertation_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dissertation_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ohid19/dissertation_bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_indonesian_fire_classification_silvanus_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_indonesian_fire_classification_silvanus_en.md new file mode 100644 index 00000000000000..09aad0ffed3a92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_indonesian_fire_classification_silvanus_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_indonesian_fire_classification_silvanus DistilBertForSequenceClassification from rollerhafeezh-amikom +author: John Snow Labs +name: distilbert_base_indonesian_fire_classification_silvanus +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_indonesian_fire_classification_silvanus` is a English model originally trained by rollerhafeezh-amikom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_indonesian_fire_classification_silvanus_en_5.5.0_3.0_1727154860154.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_indonesian_fire_classification_silvanus_en_5.5.0_3.0_1727154860154.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_indonesian_fire_classification_silvanus","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_indonesian_fire_classification_silvanus", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_indonesian_fire_classification_silvanus| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|255.2 MB| + +## References + +https://huggingface.co/rollerhafeezh-amikom/distilbert-base-indonesian-fire-classification-silvanus \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_indonesian_fire_classification_silvanus_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_indonesian_fire_classification_silvanus_pipeline_en.md new file mode 100644 index 00000000000000..ba5fee150ebdab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_indonesian_fire_classification_silvanus_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_indonesian_fire_classification_silvanus_pipeline pipeline DistilBertForSequenceClassification from rollerhafeezh-amikom +author: John Snow Labs +name: distilbert_base_indonesian_fire_classification_silvanus_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_indonesian_fire_classification_silvanus_pipeline` is a English model originally trained by rollerhafeezh-amikom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_indonesian_fire_classification_silvanus_pipeline_en_5.5.0_3.0_1727154873177.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_indonesian_fire_classification_silvanus_pipeline_en_5.5.0_3.0_1727154873177.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_indonesian_fire_classification_silvanus_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_indonesian_fire_classification_silvanus_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_indonesian_fire_classification_silvanus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|255.3 MB| + +## References + +https://huggingface.co/rollerhafeezh-amikom/distilbert-base-indonesian-fire-classification-silvanus + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_multilingual_cased_sent_negativo_esp_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_multilingual_cased_sent_negativo_esp_pipeline_xx.md new file mode 100644 index 00000000000000..4eed197d4e9964 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_multilingual_cased_sent_negativo_esp_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual distilbert_base_multilingual_cased_sent_negativo_esp_pipeline pipeline DistilBertForSequenceClassification from rogelioplatt +author: John Snow Labs +name: distilbert_base_multilingual_cased_sent_negativo_esp_pipeline +date: 2024-09-24 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_multilingual_cased_sent_negativo_esp_pipeline` is a Multilingual model originally trained by rogelioplatt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_sent_negativo_esp_pipeline_xx_5.5.0_3.0_1727204955319.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_multilingual_cased_sent_negativo_esp_pipeline_xx_5.5.0_3.0_1727204955319.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_multilingual_cased_sent_negativo_esp_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_multilingual_cased_sent_negativo_esp_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_multilingual_cased_sent_negativo_esp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|507.6 MB| + +## References + +https://huggingface.co/rogelioplatt/distilbert-base-multilingual-cased-Sent_Negativo_Esp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_thai_cased_finetuned_sentiment_cleaned_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_thai_cased_finetuned_sentiment_cleaned_en.md new file mode 100644 index 00000000000000..9aac433851edd7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_thai_cased_finetuned_sentiment_cleaned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_thai_cased_finetuned_sentiment_cleaned DistilBertForSequenceClassification from FlukeTJ +author: John Snow Labs +name: distilbert_base_thai_cased_finetuned_sentiment_cleaned +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_thai_cased_finetuned_sentiment_cleaned` is a English model originally trained by FlukeTJ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_thai_cased_finetuned_sentiment_cleaned_en_5.5.0_3.0_1727137179285.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_thai_cased_finetuned_sentiment_cleaned_en_5.5.0_3.0_1727137179285.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_thai_cased_finetuned_sentiment_cleaned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_thai_cased_finetuned_sentiment_cleaned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_thai_cased_finetuned_sentiment_cleaned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|507.6 MB| + +## References + +https://huggingface.co/FlukeTJ/distilbert-base-thai-cased-finetuned-sentiment-cleaned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_5000_questions_gt_3_5epochs_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_5000_questions_gt_3_5epochs_en.md new file mode 100644 index 00000000000000..67c62cd497d2b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_5000_questions_gt_3_5epochs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_5000_questions_gt_3_5epochs DistilBertForSequenceClassification from Abhibeats95 +author: John Snow Labs +name: distilbert_base_uncased_5000_questions_gt_3_5epochs +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_5000_questions_gt_3_5epochs` is a English model originally trained by Abhibeats95. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_5000_questions_gt_3_5epochs_en_5.5.0_3.0_1727137172073.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_5000_questions_gt_3_5epochs_en_5.5.0_3.0_1727137172073.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_5000_questions_gt_3_5epochs","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_5000_questions_gt_3_5epochs", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_5000_questions_gt_3_5epochs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Abhibeats95/distilbert-base-uncased-5000_questions_gt_3_5epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_fb_housing_posts_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_fb_housing_posts_en.md new file mode 100644 index 00000000000000..c6899214c193ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_fb_housing_posts_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_fb_housing_posts DistilBertForSequenceClassification from hoaj +author: John Snow Labs +name: distilbert_base_uncased_fb_housing_posts +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_fb_housing_posts` is a English model originally trained by hoaj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_fb_housing_posts_en_5.5.0_3.0_1727164361417.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_fb_housing_posts_en_5.5.0_3.0_1727164361417.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_fb_housing_posts","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_fb_housing_posts", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_fb_housing_posts| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hoaj/distilbert-base-uncased-fb-housing-posts \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_bisoye_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_bisoye_en.md new file mode 100644 index 00000000000000..82d1faff1ba439 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_bisoye_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_bisoye DistilBertForSequenceClassification from bisoye +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_bisoye +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_bisoye` is a English model originally trained by bisoye. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_bisoye_en_5.5.0_3.0_1727154941511.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_bisoye_en_5.5.0_3.0_1727154941511.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_bisoye","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_bisoye", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_bisoye| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/bisoye/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_bisoye_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_bisoye_pipeline_en.md new file mode 100644 index 00000000000000..f010890f6bbe0e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_bisoye_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_bisoye_pipeline pipeline DistilBertForSequenceClassification from bisoye +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_bisoye_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_bisoye_pipeline` is a English model originally trained by bisoye. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_bisoye_pipeline_en_5.5.0_3.0_1727154954941.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_bisoye_pipeline_en_5.5.0_3.0_1727154954941.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_bisoye_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_bisoye_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_bisoye_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/bisoye/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_pipeline_en.md new file mode 100644 index 00000000000000..e20f5605f60f78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_pipeline pipeline DistilBertForSequenceClassification from nachikethmurthy666 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_pipeline` is a English model originally trained by nachikethmurthy666. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_pipeline_en_5.5.0_3.0_1727136840887.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_pipeline_en_5.5.0_3.0_1727136840887.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_nachikethmurthy666_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/nachikethmurthy666/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_woodspoon09_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_woodspoon09_en.md new file mode 100644 index 00000000000000..460596aa20fa99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_woodspoon09_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_woodspoon09 DistilBertForSequenceClassification from woodspoon09 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_woodspoon09 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_woodspoon09` is a English model originally trained by woodspoon09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_woodspoon09_en_5.5.0_3.0_1727154712970.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_woodspoon09_en_5.5.0_3.0_1727154712970.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_woodspoon09","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_clinc_woodspoon09", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_woodspoon09| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/woodspoon09/distilbert-base-uncased-finetuned-clinc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_woodspoon09_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_woodspoon09_pipeline_en.md new file mode 100644 index 00000000000000..a2ea1bba01ce04 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_clinc_woodspoon09_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_clinc_woodspoon09_pipeline pipeline DistilBertForSequenceClassification from woodspoon09 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_clinc_woodspoon09_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_clinc_woodspoon09_pipeline` is a English model originally trained by woodspoon09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_woodspoon09_pipeline_en_5.5.0_3.0_1727154727002.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_clinc_woodspoon09_pipeline_en_5.5.0_3.0_1727154727002.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_woodspoon09_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_clinc_woodspoon09_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_clinc_woodspoon09_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.9 MB| + +## References + +https://huggingface.co/woodspoon09/distilbert-base-uncased-finetuned-clinc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_againeureka_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_againeureka_pipeline_en.md new file mode 100644 index 00000000000000..b9080f3c901e36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_againeureka_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_againeureka_pipeline pipeline DistilBertForSequenceClassification from againeureka +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_againeureka_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_againeureka_pipeline` is a English model originally trained by againeureka. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_againeureka_pipeline_en_5.5.0_3.0_1727164492245.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_againeureka_pipeline_en_5.5.0_3.0_1727164492245.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_againeureka_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_againeureka_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_againeureka_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/againeureka/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_negfir_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_negfir_en.md new file mode 100644 index 00000000000000..c0321b068efd6b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_negfir_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_negfir BertForSequenceClassification from negfir +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_negfir +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_negfir` is a English model originally trained by negfir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_negfir_en_5.5.0_3.0_1727222318264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_negfir_en_5.5.0_3.0_1727222318264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_negfir","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_negfir", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_negfir| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|24.2 MB| + +## References + +https://huggingface.co/negfir/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_negfir_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_negfir_pipeline_en.md new file mode 100644 index 00000000000000..2f76e92bec5050 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_negfir_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_negfir_pipeline pipeline BertForSequenceClassification from negfir +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_negfir_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_negfir_pipeline` is a English model originally trained by negfir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_negfir_pipeline_en_5.5.0_3.0_1727222319717.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_negfir_pipeline_en_5.5.0_3.0_1727222319717.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_negfir_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_negfir_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_negfir_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|24.2 MB| + +## References + +https://huggingface.co/negfir/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_rayane321_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_rayane321_en.md new file mode 100644 index 00000000000000..88683ccb30caa6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_rayane321_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_rayane321 DistilBertForSequenceClassification from rayane321 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_rayane321 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_rayane321` is a English model originally trained by rayane321. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_rayane321_en_5.5.0_3.0_1727137183273.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_rayane321_en_5.5.0_3.0_1727137183273.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_rayane321","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_rayane321", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_rayane321| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rayane321/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_rayane321_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_rayane321_pipeline_en.md new file mode 100644 index 00000000000000..9f4ae36eadf66f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_rayane321_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_rayane321_pipeline pipeline DistilBertForSequenceClassification from rayane321 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_rayane321_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_rayane321_pipeline` is a English model originally trained by rayane321. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_rayane321_pipeline_en_5.5.0_3.0_1727137197535.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_rayane321_pipeline_en_5.5.0_3.0_1727137197535.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_rayane321_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_rayane321_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_rayane321_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/rayane321/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_robuved_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_robuved_en.md new file mode 100644 index 00000000000000..f15fc245a967ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_robuved_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_robuved DistilBertForSequenceClassification from robuved +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_robuved +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_robuved` is a English model originally trained by robuved. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_robuved_en_5.5.0_3.0_1727154839959.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_robuved_en_5.5.0_3.0_1727154839959.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_robuved","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_cola_robuved", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_robuved| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/robuved/distilbert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_wy3106714391_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_wy3106714391_pipeline_en.md new file mode 100644 index 00000000000000..bdbe6bed085306 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_cola_wy3106714391_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_cola_wy3106714391_pipeline pipeline DistilBertForSequenceClassification from wy3106714391 +author: John Snow Labs +name: distilbert_base_uncased_finetuned_cola_wy3106714391_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_cola_wy3106714391_pipeline` is a English model originally trained by wy3106714391. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_wy3106714391_pipeline_en_5.5.0_3.0_1727164164348.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_cola_wy3106714391_pipeline_en_5.5.0_3.0_1727164164348.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_cola_wy3106714391_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_cola_wy3106714391_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_cola_wy3106714391_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/wy3106714391/distilbert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_en.md new file mode 100644 index 00000000000000..8a0f80bce0ebb1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000 DistilBertForSequenceClassification from atsstagram +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000` is a English model originally trained by atsstagram. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_en_5.5.0_3.0_1727137041067.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_en_5.5.0_3.0_1727137041067.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/atsstagram/distilbert-base-uncased-finetuned-emotion-balanced-1000plus3000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_pipeline_en.md new file mode 100644 index 00000000000000..fea01fc1c3d777 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_pipeline pipeline DistilBertForSequenceClassification from atsstagram +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_pipeline` is a English model originally trained by atsstagram. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_pipeline_en_5.5.0_3.0_1727137054474.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_pipeline_en_5.5.0_3.0_1727137054474.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_balanced_1000plus3000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/atsstagram/distilbert-base-uncased-finetuned-emotion-balanced-1000plus3000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_camaganu_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_camaganu_en.md new file mode 100644 index 00000000000000..96985f76f4260b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_camaganu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_camaganu DistilBertForSequenceClassification from camaganu +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_camaganu +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_camaganu` is a English model originally trained by camaganu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_camaganu_en_5.5.0_3.0_1727164765994.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_camaganu_en_5.5.0_3.0_1727164765994.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_camaganu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_camaganu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_camaganu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/camaganu/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_camaganu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_camaganu_pipeline_en.md new file mode 100644 index 00000000000000..54394f87b3119c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_camaganu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_camaganu_pipeline pipeline DistilBertForSequenceClassification from camaganu +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_camaganu_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_camaganu_pipeline` is a English model originally trained by camaganu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_camaganu_pipeline_en_5.5.0_3.0_1727164778657.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_camaganu_pipeline_en_5.5.0_3.0_1727164778657.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_camaganu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_camaganu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_camaganu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/camaganu/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_randomchar_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_randomchar_en.md new file mode 100644 index 00000000000000..6276cceb618c33 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_randomchar_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_randomchar DistilBertForSequenceClassification from RandomChar +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_randomchar +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_randomchar` is a English model originally trained by RandomChar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_randomchar_en_5.5.0_3.0_1727164264856.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_randomchar_en_5.5.0_3.0_1727164264856.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_randomchar","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_randomchar", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_randomchar| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/RandomChar/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_ryli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_ryli_pipeline_en.md new file mode 100644 index 00000000000000..e5bf957c48e1e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_ryli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_ryli_pipeline pipeline DistilBertForSequenceClassification from ryli +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_ryli_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_ryli_pipeline` is a English model originally trained by ryli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ryli_pipeline_en_5.5.0_3.0_1727137068000.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_ryli_pipeline_en_5.5.0_3.0_1727137068000.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_ryli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_ryli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_ryli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ryli/distilbert-base-uncased-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_transformersbook_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_transformersbook_en.md new file mode 100644 index 00000000000000..2f149cd6dd67b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_transformersbook_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_transformersbook DistilBertForSequenceClassification from transformersbook +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_transformersbook +date: 2024-09-24 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_transformersbook` is a English model originally trained by transformersbook. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_transformersbook_en_5.5.0_3.0_1727155017957.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_transformersbook_en_5.5.0_3.0_1727155017957.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_transformersbook","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_finetuned_emotion_transformersbook","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_transformersbook| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +References + +https://huggingface.co/transformersbook/distilbert-base-uncased-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_transformersbook_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_transformersbook_pipeline_en.md new file mode 100644 index 00000000000000..05d9944e13a7d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_finetuned_emotion_transformersbook_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_finetuned_emotion_transformersbook_pipeline pipeline DistilBertForSequenceClassification from hcyying +author: John Snow Labs +name: distilbert_base_uncased_finetuned_emotion_transformersbook_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_finetuned_emotion_transformersbook_pipeline` is a English model originally trained by hcyying. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_transformersbook_pipeline_en_5.5.0_3.0_1727155031131.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_finetuned_emotion_transformersbook_pipeline_en_5.5.0_3.0_1727155031131.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_transformersbook_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_finetuned_emotion_transformersbook_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_finetuned_emotion_transformersbook_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hcyying/distilbert-base-uncased-finetuned-emotion-transformersbook + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_pipeline_en.md new file mode 100644 index 00000000000000..1831324c346c39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_pipeline_en_5.5.0_3.0_1727164160521.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_pipeline_en_5.5.0_3.0_1727164160521.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_6e4exps_0strandom42sd_ut72ut5_plprefix0stlarge42_simsp100_clean100_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_6e4exps_0strandom42sd_ut72ut5_PLPrefix0stlarge42_simsp100_clean100 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1largepfxnf_simsp300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1largepfxnf_simsp300_pipeline_en.md new file mode 100644 index 00000000000000..8e8c5500cf7252 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1largepfxnf_simsp300_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1largepfxnf_simsp300_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1largepfxnf_simsp300_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1largepfxnf_simsp300_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1largepfxnf_simsp300_pipeline_en_5.5.0_3.0_1727164391704.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1largepfxnf_simsp300_pipeline_en_5.5.0_3.0_1727164391704.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1largepfxnf_simsp300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1largepfxnf_simsp300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st13sd_ut72ut1largepfxnf_simsp300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st13sd_ut72ut1largePfxNf_simsp300 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en.md new file mode 100644 index 00000000000000..aa963f71c116aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en_5.5.0_3.0_1727164575541.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline_en_5.5.0_3.0_1727164575541.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st19sd_ut72ut5_plprefix0stlarge_simsp100_clean300_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st19sd_ut72ut5_PLPrefix0stlarge_simsp100_clean300 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_en.md new file mode 100644 index 00000000000000..b5e482213759c0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_en_5.5.0_3.0_1727154385187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp_en_5.5.0_3.0_1727154385187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st30sd_ut72ut1large30pfxnf_simsp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st30sd_ut72ut1large30PfxNf_simsp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge103_simsp_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge103_simsp_pipeline_en.md new file mode 100644 index 00000000000000..304e1f09e5fcda --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge103_simsp_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge103_simsp_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge103_simsp_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge103_simsp_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge103_simsp_pipeline_en_5.5.0_3.0_1727137502942.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge103_simsp_pipeline_en_5.5.0_3.0_1727137502942.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge103_simsp_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge103_simsp_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut1_plprefix0stlarge103_simsp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut1_PLPrefix0stlarge103_simsp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean4sd_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean4sd_pipeline_en.md new file mode 100644 index 00000000000000..a6b93f7844af2d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean4sd_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean4sd_pipeline pipeline DistilBertForSequenceClassification from tom192180 +author: John Snow Labs +name: distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean4sd_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean4sd_pipeline` is a English model originally trained by tom192180. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean4sd_pipeline_en_5.5.0_3.0_1727154384494.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean4sd_pipeline_en_5.5.0_3.0_1727154384494.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean4sd_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean4sd_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_odm_zphr_0st42sd_ut72ut5_plprefix0stlarge42_simsp_clean4sd_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.7 MB| + +## References + +https://huggingface.co/tom192180/distilbert-base-uncased_odm_zphr_0st42sd_ut72ut5_PLPrefix0stlarge42_simsp_clean4sd + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_coping_replies_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_coping_replies_pipeline_en.md new file mode 100644 index 00000000000000..e3582f39c837a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_coping_replies_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_coping_replies_pipeline pipeline DistilBertForSequenceClassification from coping-appraisal +author: John Snow Labs +name: distilbert_coping_replies_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_coping_replies_pipeline` is a English model originally trained by coping-appraisal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_coping_replies_pipeline_en_5.5.0_3.0_1727154284313.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_coping_replies_pipeline_en_5.5.0_3.0_1727154284313.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_coping_replies_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_coping_replies_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_coping_replies_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/coping-appraisal/distilbert-coping-replies + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_ebit_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_ebit_en.md new file mode 100644 index 00000000000000..700229f0fde8b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_ebit_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_ebit DistilBertForSequenceClassification from lenguyen +author: John Snow Labs +name: distilbert_ebit +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_ebit` is a English model originally trained by lenguyen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_ebit_en_5.5.0_3.0_1727164524124.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_ebit_en_5.5.0_3.0_1727164524124.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_ebit","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_ebit", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_ebit| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|411.0 MB| + +## References + +https://huggingface.co/lenguyen/distilbert_EBIT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_finetuned_hatespeech_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_finetuned_hatespeech_pipeline_en.md new file mode 100644 index 00000000000000..5de5d96e55c7ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_finetuned_hatespeech_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_finetuned_hatespeech_pipeline pipeline DistilBertForSequenceClassification from ayln +author: John Snow Labs +name: distilbert_finetuned_hatespeech_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_hatespeech_pipeline` is a English model originally trained by ayln. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_hatespeech_pipeline_en_5.5.0_3.0_1727164484409.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_hatespeech_pipeline_en_5.5.0_3.0_1727164484409.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_hatespeech_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_hatespeech_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_hatespeech_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ayln/distilbert_finetuned_hatespeech + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_imdb_padding50model_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_imdb_padding50model_en.md new file mode 100644 index 00000000000000..406df018df71dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_imdb_padding50model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_imdb_padding50model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_imdb_padding50model +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_padding50model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_padding50model_en_5.5.0_3.0_1727154292538.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_padding50model_en_5.5.0_3.0_1727154292538.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_padding50model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_imdb_padding50model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_padding50model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_imdb_padding50model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_imdb_padding50model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_imdb_padding50model_pipeline_en.md new file mode 100644 index 00000000000000..a29771514a007a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_imdb_padding50model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_imdb_padding50model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_imdb_padding50model_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_imdb_padding50model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_imdb_padding50model_pipeline_en_5.5.0_3.0_1727154306597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_imdb_padding50model_pipeline_en_5.5.0_3.0_1727154306597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_imdb_padding50model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_imdb_padding50model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_imdb_padding50model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_imdb_padding50model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_lr_cosine_epoch_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_lr_cosine_epoch_5_pipeline_en.md new file mode 100644 index 00000000000000..abfeb2509ea67a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_lr_cosine_epoch_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_lr_cosine_epoch_5_pipeline pipeline DistilBertForSequenceClassification from K-kiron +author: John Snow Labs +name: distilbert_lr_cosine_epoch_5_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_lr_cosine_epoch_5_pipeline` is a English model originally trained by K-kiron. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_lr_cosine_epoch_5_pipeline_en_5.5.0_3.0_1727137395233.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_lr_cosine_epoch_5_pipeline_en_5.5.0_3.0_1727137395233.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_lr_cosine_epoch_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_lr_cosine_epoch_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_lr_cosine_epoch_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/K-kiron/distilbert-lr-cosine-epoch-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_ndd_html_content_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_ndd_html_content_pipeline_en.md new file mode 100644 index 00000000000000..e995ec8b7f7248 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_ndd_html_content_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_ndd_html_content_pipeline pipeline DistilBertForSequenceClassification from lgk03 +author: John Snow Labs +name: distilbert_ndd_html_content_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_ndd_html_content_pipeline` is a English model originally trained by lgk03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_ndd_html_content_pipeline_en_5.5.0_3.0_1727204801300.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_ndd_html_content_pipeline_en_5.5.0_3.0_1727204801300.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_ndd_html_content_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_ndd_html_content_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_ndd_html_content_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/lgk03/distilBERT-NDD.html.content + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_on_polarity_yelp_reviews_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_on_polarity_yelp_reviews_pipeline_en.md new file mode 100644 index 00000000000000..92dd3d07b20ab6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_on_polarity_yelp_reviews_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_on_polarity_yelp_reviews_pipeline pipeline DistilBertForSequenceClassification from BexRedpill +author: John Snow Labs +name: distilbert_on_polarity_yelp_reviews_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_on_polarity_yelp_reviews_pipeline` is a English model originally trained by BexRedpill. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_on_polarity_yelp_reviews_pipeline_en_5.5.0_3.0_1727204801364.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_on_polarity_yelp_reviews_pipeline_en_5.5.0_3.0_1727204801364.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_on_polarity_yelp_reviews_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_on_polarity_yelp_reviews_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_on_polarity_yelp_reviews_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/BexRedpill/distilbert-on-polarity-yelp-reviews + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mrpc_96_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mrpc_96_pipeline_en.md new file mode 100644 index 00000000000000..c76f3e07f2aa3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mrpc_96_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mrpc_96_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mrpc_96_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mrpc_96_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mrpc_96_pipeline_en_5.5.0_3.0_1727154979697.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mrpc_96_pipeline_en_5.5.0_3.0_1727154979697.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mrpc_96_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mrpc_96_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_data_aug_mrpc_96_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|25.7 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_data_aug_mrpc_96 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_en.md new file mode 100644 index 00000000000000..1b68a1c002d51e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_en_5.5.0_3.0_1727137282939.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256_en_5.5.0_3.0_1727137282939.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_logit_kd_stsb_256| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|71.6 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_logit_kd_stsb_256 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_mrpc_256_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_mrpc_256_en.md new file mode 100644 index 00000000000000..4923a56f748167 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_mrpc_256_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_mrpc_256 DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_mrpc_256 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_mrpc_256` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_mrpc_256_en_5.5.0_3.0_1727154906729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_mrpc_256_en_5.5.0_3.0_1727154906729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_mrpc_256","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sanskrit_saskta_glue_experiment_mrpc_256", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_mrpc_256| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|71.6 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_mrpc_256 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_mrpc_256_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_mrpc_256_pipeline_en.md new file mode 100644 index 00000000000000..58954cce7cbacb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sanskrit_saskta_glue_experiment_mrpc_256_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sanskrit_saskta_glue_experiment_mrpc_256_pipeline pipeline DistilBertForSequenceClassification from gokuls +author: John Snow Labs +name: distilbert_sanskrit_saskta_glue_experiment_mrpc_256_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sanskrit_saskta_glue_experiment_mrpc_256_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_mrpc_256_pipeline_en_5.5.0_3.0_1727154910905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sanskrit_saskta_glue_experiment_mrpc_256_pipeline_en_5.5.0_3.0_1727154910905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_mrpc_256_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sanskrit_saskta_glue_experiment_mrpc_256_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sanskrit_saskta_glue_experiment_mrpc_256_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|71.6 MB| + +## References + +https://huggingface.co/gokuls/distilbert_sa_GLUE_Experiment_mrpc_256 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_sentiment_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sentiment_en.md new file mode 100644 index 00000000000000..89bdf48a367181 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sentiment_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English distilbert_sentiment DistilBertForSequenceClassification from AbeerAlbashiti +author: John Snow Labs +name: distilbert_sentiment +date: 2024-09-24 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sentiment` is a English model originally trained by AbeerAlbashiti. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sentiment_en_5.5.0_3.0_1727136956400.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sentiment_en_5.5.0_3.0_1727136956400.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sentiment","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sentiment","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +References + +https://huggingface.co/AbeerAlbashiti/distilbert-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_sst5_padding0model_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sst5_padding0model_en.md new file mode 100644 index 00000000000000..89483168db4d1d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sst5_padding0model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_sst5_padding0model DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_sst5_padding0model +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sst5_padding0model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sst5_padding0model_en_5.5.0_3.0_1727154953564.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sst5_padding0model_en_5.5.0_3.0_1727154953564.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sst5_padding0model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("distilbert_sst5_padding0model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sst5_padding0model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_sst5_padding0model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_sst5_padding0model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sst5_padding0model_pipeline_en.md new file mode 100644 index 00000000000000..b25df17ebbd514 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_sst5_padding0model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_sst5_padding0model_pipeline pipeline DistilBertForSequenceClassification from Realgon +author: John Snow Labs +name: distilbert_sst5_padding0model_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_sst5_padding0model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_sst5_padding0model_pipeline_en_5.5.0_3.0_1727154968746.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_sst5_padding0model_pipeline_en_5.5.0_3.0_1727154968746.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_sst5_padding0model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_sst5_padding0model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_sst5_padding0model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Realgon/distilbert_sst5_padding0model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilbert_uncased_newsqa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilbert_uncased_newsqa_pipeline_en.md new file mode 100644 index 00000000000000..d4a2895dcd18fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilbert_uncased_newsqa_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English distilbert_uncased_newsqa_pipeline pipeline DistilBertForQuestionAnswering from Prasetyow12 +author: John Snow Labs +name: distilbert_uncased_newsqa_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_uncased_newsqa_pipeline` is a English model originally trained by Prasetyow12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_uncased_newsqa_pipeline_en_5.5.0_3.0_1727219916490.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_uncased_newsqa_pipeline_en_5.5.0_3.0_1727219916490.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_uncased_newsqa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_uncased_newsqa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_uncased_newsqa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|247.3 MB| + +## References + +https://huggingface.co/Prasetyow12/distilbert-uncased-newsqa + +## Included Models + +- MultiDocumentAssembler +- DistilBertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_wikitext2_0409nnn_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_wikitext2_0409nnn_en.md new file mode 100644 index 00000000000000..0a946a03cc9132 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_finetuned_wikitext2_0409nnn_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilroberta_base_finetuned_wikitext2_0409nnn RoBertaEmbeddings from ntust0 +author: John Snow Labs +name: distilroberta_base_finetuned_wikitext2_0409nnn +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_finetuned_wikitext2_0409nnn` is a English model originally trained by ntust0. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_0409nnn_en_5.5.0_3.0_1727168787663.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_finetuned_wikitext2_0409nnn_en_5.5.0_3.0_1727168787663.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_finetuned_wikitext2_0409nnn","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("distilroberta_base_finetuned_wikitext2_0409nnn","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_finetuned_wikitext2_0409nnn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/ntust0/distilroberta-base-finetuned-wikitext2-0409nnn \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_ft_4chan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_ft_4chan_pipeline_en.md new file mode 100644 index 00000000000000..b60b762a24f2cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-distilroberta_base_ft_4chan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilroberta_base_ft_4chan_pipeline pipeline RoBertaEmbeddings from jkruk +author: John Snow Labs +name: distilroberta_base_ft_4chan_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilroberta_base_ft_4chan_pipeline` is a English model originally trained by jkruk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_4chan_pipeline_en_5.5.0_3.0_1727168963458.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilroberta_base_ft_4chan_pipeline_en_5.5.0_3.0_1727168963458.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilroberta_base_ft_4chan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilroberta_base_ft_4chan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilroberta_base_ft_4chan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.5 MB| + +## References + +https://huggingface.co/jkruk/distilroberta-base-ft-4chan + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-e5_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-e5_base_pipeline_en.md new file mode 100644 index 00000000000000..d82525c48b2872 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-e5_base_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English e5_base_pipeline pipeline E5Embeddings from intfloat +author: John Snow Labs +name: e5_base_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained E5Embeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`e5_base_pipeline` is a English model originally trained by intfloat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/e5_base_pipeline_en_5.5.0_3.0_1727217873798.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/e5_base_pipeline_en_5.5.0_3.0_1727217873798.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("e5_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("e5_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|e5_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|258.6 MB| + +## References + +https://huggingface.co/intfloat/e5-base + +## Included Models + +- DocumentAssembler +- E5Embeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-e5_large_en.md b/docs/_posts/ahmedlone127/2024-09-24-e5_large_en.md new file mode 100644 index 00000000000000..262cad79db1451 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-e5_large_en.md @@ -0,0 +1,73 @@ +--- +layout: model +title: E5 Large Sentence Embeddings +author: John Snow Labs +name: e5_large +date: 2024-09-24 +tags: [en, open_source, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: E5Embeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022 + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/e5_large_en_5.5.0_3.0_1727217963878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/e5_large_en_5.5.0_3.0_1727217963878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +embeddings =E5Embeddings.pretrained("e5_large","en") \ + .setInputCols(["documents"]) \ + .setOutputCol("instructor") + +pipeline = Pipeline().setStages([document_assembler, embeddings]) +``` +```scala +val embeddings = E5Embeddings.pretrained("e5_large","en") + .setInputCols(["document"]) + .setOutputCol("e5_embeddings") +val pipeline = new Pipeline().setStages(Array(document, embeddings)) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|e5_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[E5]| +|Language:|en| +|Size:|796.1 MB| + +## References + +References + +https://huggingface.co/intfloat/e5-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-e5_large_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-e5_large_pipeline_en.md new file mode 100644 index 00000000000000..16e365f85ce24d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-e5_large_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English e5_large_pipeline pipeline E5Embeddings from intfloat +author: John Snow Labs +name: e5_large_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained E5Embeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`e5_large_pipeline` is a English model originally trained by intfloat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/e5_large_pipeline_en_5.5.0_3.0_1727218193373.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/e5_large_pipeline_en_5.5.0_3.0_1727218193373.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("e5_large_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("e5_large_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|e5_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|796.1 MB| + +## References + +https://huggingface.co/intfloat/e5-large + +## Included Models + +- DocumentAssembler +- E5Embeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-e5_small_en.md b/docs/_posts/ahmedlone127/2024-09-24-e5_small_en.md new file mode 100644 index 00000000000000..c58797cf36a888 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-e5_small_en.md @@ -0,0 +1,67 @@ +--- +layout: model +title: E5 Small Sentence Embeddings +author: John Snow Labs +name: e5_small +date: 2024-09-24 +tags: [en, open_source, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: E5Embeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Text Embeddings by Weakly-Supervised Contrastive Pre-training. Liang Wang, Nan Yang, Xiaolong Huang, Binxing Jiao, Linjun Yang, Daxin Jiang, Rangan Majumder, Furu Wei, arXiv 2022 + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/e5_small_en_5.5.0_3.0_1727217734668.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/e5_small_en_5.5.0_3.0_1727217734668.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +embeddings =E5Embeddings.pretrained("e5_small","en") \ + .setInputCols(["documents"]) \ + .setOutputCol("instructor") + +pipeline = Pipeline().setStages([document_assembler, embeddings]) +``` +```scala +val embeddings = E5Embeddings.pretrained("e5_small","en") + .setInputCols(["document"]) + .setOutputCol("e5_embeddings") +val pipeline = new Pipeline().setStages(Array(document, embeddings)) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|e5_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document]| +|Output Labels:|[E5]| +|Language:|en| +|Size:|79.9 MB| \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-e5_small_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-e5_small_pipeline_en.md new file mode 100644 index 00000000000000..f21e17a3e47cf8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-e5_small_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English e5_small_pipeline pipeline E5Embeddings from intfloat +author: John Snow Labs +name: e5_small_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained E5Embeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`e5_small_pipeline` is a English model originally trained by intfloat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/e5_small_pipeline_en_5.5.0_3.0_1727217757925.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/e5_small_pipeline_en_5.5.0_3.0_1727217757925.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("e5_small_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("e5_small_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|e5_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|79.9 MB| + +## References + +https://huggingface.co/intfloat/e5-small + +## Included Models + +- DocumentAssembler +- E5Embeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-email_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-email_classification_pipeline_en.md new file mode 100644 index 00000000000000..21cfce4a781bcd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-email_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English email_classification_pipeline pipeline RoBertaForSequenceClassification from arya555 +author: John Snow Labs +name: email_classification_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`email_classification_pipeline` is a English model originally trained by arya555. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/email_classification_pipeline_en_5.5.0_3.0_1727171756827.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/email_classification_pipeline_en_5.5.0_3.0_1727171756827.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("email_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("email_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|email_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|427.2 MB| + +## References + +https://huggingface.co/arya555/email_classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-environmentalbert_water_en.md b/docs/_posts/ahmedlone127/2024-09-24-environmentalbert_water_en.md new file mode 100644 index 00000000000000..ddadd19d360e3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-environmentalbert_water_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English environmentalbert_water RoBertaForSequenceClassification from ESGBERT +author: John Snow Labs +name: environmentalbert_water +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`environmentalbert_water` is a English model originally trained by ESGBERT. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/environmentalbert_water_en_5.5.0_3.0_1727168183583.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/environmentalbert_water_en_5.5.0_3.0_1727168183583.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("environmentalbert_water","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("environmentalbert_water", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|environmentalbert_water| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/ESGBERT/EnvironmentalBERT-water \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-environmentalbert_water_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-environmentalbert_water_pipeline_en.md new file mode 100644 index 00000000000000..a857a45ce8d21e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-environmentalbert_water_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English environmentalbert_water_pipeline pipeline RoBertaForSequenceClassification from ESGBERT +author: John Snow Labs +name: environmentalbert_water_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`environmentalbert_water_pipeline` is a English model originally trained by ESGBERT. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/environmentalbert_water_pipeline_en_5.5.0_3.0_1727168198845.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/environmentalbert_water_pipeline_en_5.5.0_3.0_1727168198845.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("environmentalbert_water_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("environmentalbert_water_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|environmentalbert_water_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/ESGBERT/EnvironmentalBERT-water + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finbert_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finbert_ner_pipeline_en.md new file mode 100644 index 00000000000000..16749d41471e8b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finbert_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finbert_ner_pipeline pipeline BertForTokenClassification from Rupesh2 +author: John Snow Labs +name: finbert_ner_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finbert_ner_pipeline` is a English model originally trained by Rupesh2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finbert_ner_pipeline_en_5.5.0_3.0_1727196344092.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finbert_ner_pipeline_en_5.5.0_3.0_1727196344092.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finbert_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finbert_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finbert_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/Rupesh2/finbert-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuned_bert_policy_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuned_bert_policy_classifier_en.md new file mode 100644 index 00000000000000..d7bc6d156bc546 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuned_bert_policy_classifier_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_bert_policy_classifier BertForSequenceClassification from aryaniyaps +author: John Snow Labs +name: finetuned_bert_policy_classifier +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_bert_policy_classifier` is a English model originally trained by aryaniyaps. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_bert_policy_classifier_en_5.5.0_3.0_1727219004416.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_bert_policy_classifier_en_5.5.0_3.0_1727219004416.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("finetuned_bert_policy_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("finetuned_bert_policy_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_bert_policy_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/aryaniyaps/finetuned-bert-policy-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuned_bert_policy_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuned_bert_policy_classifier_pipeline_en.md new file mode 100644 index 00000000000000..9a0549ad146089 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuned_bert_policy_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_bert_policy_classifier_pipeline pipeline BertForSequenceClassification from aryaniyaps +author: John Snow Labs +name: finetuned_bert_policy_classifier_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_bert_policy_classifier_pipeline` is a English model originally trained by aryaniyaps. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_bert_policy_classifier_pipeline_en_5.5.0_3.0_1727219026231.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_bert_policy_classifier_pipeline_en_5.5.0_3.0_1727219026231.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_bert_policy_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_bert_policy_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_bert_policy_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/aryaniyaps/finetuned-bert-policy-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuned_demo_2_shardev_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuned_demo_2_shardev_pipeline_en.md new file mode 100644 index 00000000000000..4d41b680b6ad15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuned_demo_2_shardev_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuned_demo_2_shardev_pipeline pipeline DistilBertForSequenceClassification from Shardev +author: John Snow Labs +name: finetuned_demo_2_shardev_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_demo_2_shardev_pipeline` is a English model originally trained by Shardev. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_demo_2_shardev_pipeline_en_5.5.0_3.0_1727164164033.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_demo_2_shardev_pipeline_en_5.5.0_3.0_1727164164033.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuned_demo_2_shardev_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuned_demo_2_shardev_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_demo_2_shardev_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|246.0 MB| + +## References + +https://huggingface.co/Shardev/finetuned_demo_2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuned_distilroberta_base_semeval_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuned_distilroberta_base_semeval_en.md new file mode 100644 index 00000000000000..c3877150425537 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuned_distilroberta_base_semeval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_distilroberta_base_semeval RoBertaForSequenceClassification from Youssef320 +author: John Snow Labs +name: finetuned_distilroberta_base_semeval +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_distilroberta_base_semeval` is a English model originally trained by Youssef320. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_distilroberta_base_semeval_en_5.5.0_3.0_1727172120857.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_distilroberta_base_semeval_en_5.5.0_3.0_1727172120857.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("finetuned_distilroberta_base_semeval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("finetuned_distilroberta_base_semeval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_distilroberta_base_semeval| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.9 MB| + +## References + +https://huggingface.co/Youssef320/finetuned-distilroberta-base-SemEval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetunedemotionmodel_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetunedemotionmodel_pipeline_en.md new file mode 100644 index 00000000000000..fb1ac65ebc5a42 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetunedemotionmodel_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetunedemotionmodel_pipeline pipeline DistilBertForSequenceClassification from Rishabh3108 +author: John Snow Labs +name: finetunedemotionmodel_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetunedemotionmodel_pipeline` is a English model originally trained by Rishabh3108. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetunedemotionmodel_pipeline_en_5.5.0_3.0_1727164255965.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetunedemotionmodel_pipeline_en_5.5.0_3.0_1727164255965.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetunedemotionmodel_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetunedemotionmodel_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetunedemotionmodel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Rishabh3108/finetunedemotionmodel + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_kaggle_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_kaggle_en.md new file mode 100644 index 00000000000000..09b587e0b71ff5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_kaggle_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_kaggle DistilBertForSequenceClassification from Munshid123 +author: John Snow Labs +name: finetuning_sentiment_model_3000_kaggle +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_kaggle` is a English model originally trained by Munshid123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_kaggle_en_5.5.0_3.0_1727154719128.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_kaggle_en_5.5.0_3.0_1727154719128.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_kaggle","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3000_kaggle", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_kaggle| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Munshid123/finetuning-sentiment-model-3000-kaggle \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_aadrik_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_aadrik_pipeline_en.md new file mode 100644 index 00000000000000..cbfc88fcc85464 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_aadrik_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_aadrik_pipeline pipeline DistilBertForSequenceClassification from aadrik +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_aadrik_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_aadrik_pipeline` is a English model originally trained by aadrik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_aadrik_pipeline_en_5.5.0_3.0_1727164275627.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_aadrik_pipeline_en_5.5.0_3.0_1727164275627.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_aadrik_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_aadrik_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_aadrik_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aadrik/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_dnzy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_dnzy_pipeline_en.md new file mode 100644 index 00000000000000..7af331dc4e4477 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_dnzy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_dnzy_pipeline pipeline DistilBertForSequenceClassification from DNZY +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_dnzy_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_dnzy_pipeline` is a English model originally trained by DNZY. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_dnzy_pipeline_en_5.5.0_3.0_1727164741645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_dnzy_pipeline_en_5.5.0_3.0_1727164741645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_dnzy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_dnzy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_dnzy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/DNZY/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_jbnextnext_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_jbnextnext_pipeline_en.md new file mode 100644 index 00000000000000..dd3d95774d2a09 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3000_samples_jbnextnext_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_model_3000_samples_jbnextnext_pipeline pipeline DistilBertForSequenceClassification from jbnextnext +author: John Snow Labs +name: finetuning_sentiment_model_3000_samples_jbnextnext_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3000_samples_jbnextnext_pipeline` is a English model originally trained by jbnextnext. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_jbnextnext_pipeline_en_5.5.0_3.0_1727155056284.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3000_samples_jbnextnext_pipeline_en_5.5.0_3.0_1727155056284.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_model_3000_samples_jbnextnext_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_model_3000_samples_jbnextnext_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3000_samples_jbnextnext_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jbnextnext/finetuning-sentiment-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3500_samples_train_kurtbadelt_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3500_samples_train_kurtbadelt_en.md new file mode 100644 index 00000000000000..0c1e159c8c93e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_3500_samples_train_kurtbadelt_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_3500_samples_train_kurtbadelt DistilBertForSequenceClassification from KurtBadelt +author: John Snow Labs +name: finetuning_sentiment_model_3500_samples_train_kurtbadelt +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_3500_samples_train_kurtbadelt` is a English model originally trained by KurtBadelt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3500_samples_train_kurtbadelt_en_5.5.0_3.0_1727154263260.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_3500_samples_train_kurtbadelt_en_5.5.0_3.0_1727154263260.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3500_samples_train_kurtbadelt","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_3500_samples_train_kurtbadelt", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_3500_samples_train_kurtbadelt| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/KurtBadelt/finetuning-sentiment-model-3500-samples-train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_fifa_15766_samples_en.md b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_fifa_15766_samples_en.md new file mode 100644 index 00000000000000..a4a1bbfe19ce37 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-finetuning_sentiment_model_fifa_15766_samples_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_model_fifa_15766_samples DistilBertForSequenceClassification from mdelrosa13 +author: John Snow Labs +name: finetuning_sentiment_model_fifa_15766_samples +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_model_fifa_15766_samples` is a English model originally trained by mdelrosa13. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_fifa_15766_samples_en_5.5.0_3.0_1727154813159.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_model_fifa_15766_samples_en_5.5.0_3.0_1727154813159.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_fifa_15766_samples","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("finetuning_sentiment_model_fifa_15766_samples", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_model_fifa_15766_samples| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/mdelrosa13/finetuning-sentiment-model-fifa-15766-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-food_not_food_distill_bert_en.md b/docs/_posts/ahmedlone127/2024-09-24-food_not_food_distill_bert_en.md new file mode 100644 index 00000000000000..3c8db4e3fa20b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-food_not_food_distill_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English food_not_food_distill_bert DistilBertForSequenceClassification from ImpactTom6819 +author: John Snow Labs +name: food_not_food_distill_bert +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`food_not_food_distill_bert` is a English model originally trained by ImpactTom6819. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/food_not_food_distill_bert_en_5.5.0_3.0_1727205005327.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/food_not_food_distill_bert_en_5.5.0_3.0_1727205005327.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("food_not_food_distill_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("food_not_food_distill_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|food_not_food_distill_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.4 MB| + +## References + +https://huggingface.co/ImpactTom6819/food_not_food_distill-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-food_not_food_distill_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-food_not_food_distill_bert_pipeline_en.md new file mode 100644 index 00000000000000..99e25a66a36f01 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-food_not_food_distill_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English food_not_food_distill_bert_pipeline pipeline DistilBertForSequenceClassification from ImpactTom6819 +author: John Snow Labs +name: food_not_food_distill_bert_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`food_not_food_distill_bert_pipeline` is a English model originally trained by ImpactTom6819. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/food_not_food_distill_bert_pipeline_en_5.5.0_3.0_1727205019453.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/food_not_food_distill_bert_pipeline_en_5.5.0_3.0_1727205019453.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("food_not_food_distill_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("food_not_food_distill_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|food_not_food_distill_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/ImpactTom6819/food_not_food_distill-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-fullcombined_manifesto10000_en.md b/docs/_posts/ahmedlone127/2024-09-24-fullcombined_manifesto10000_en.md new file mode 100644 index 00000000000000..7f4ce92bb9fe52 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-fullcombined_manifesto10000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fullcombined_manifesto10000 RoBertaForSequenceClassification from jordankrishnayah +author: John Snow Labs +name: fullcombined_manifesto10000 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fullcombined_manifesto10000` is a English model originally trained by jordankrishnayah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fullcombined_manifesto10000_en_5.5.0_3.0_1727171869930.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fullcombined_manifesto10000_en_5.5.0_3.0_1727171869930.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("fullcombined_manifesto10000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("fullcombined_manifesto10000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fullcombined_manifesto10000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/jordankrishnayah/fullCombined-manifesto10000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-fullcombined_manifesto10000_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-fullcombined_manifesto10000_pipeline_en.md new file mode 100644 index 00000000000000..6c3d877a79ca51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-fullcombined_manifesto10000_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fullcombined_manifesto10000_pipeline pipeline RoBertaForSequenceClassification from jordankrishnayah +author: John Snow Labs +name: fullcombined_manifesto10000_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fullcombined_manifesto10000_pipeline` is a English model originally trained by jordankrishnayah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fullcombined_manifesto10000_pipeline_en_5.5.0_3.0_1727171893131.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fullcombined_manifesto10000_pipeline_en_5.5.0_3.0_1727171893131.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fullcombined_manifesto10000_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fullcombined_manifesto10000_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fullcombined_manifesto10000_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.2 MB| + +## References + +https://huggingface.co/jordankrishnayah/fullCombined-manifesto10000 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-furina_en.md b/docs/_posts/ahmedlone127/2024-09-24-furina_en.md new file mode 100644 index 00000000000000..54662528e5292b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-furina_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English furina XlmRoBertaEmbeddings from yihongLiu +author: John Snow Labs +name: furina +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, xlm_roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`furina` is a English model originally trained by yihongLiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/furina_en_5.5.0_3.0_1727209808625.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/furina_en_5.5.0_3.0_1727209808625.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = XlmRoBertaEmbeddings.pretrained("furina","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = XlmRoBertaEmbeddings.pretrained("furina","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|furina| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[xlm_roberta]| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/yihongLiu/furina \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-furina_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-furina_pipeline_en.md new file mode 100644 index 00000000000000..1dc2c83a5f5f6b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-furina_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English furina_pipeline pipeline XlmRoBertaEmbeddings from yihongLiu +author: John Snow Labs +name: furina_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`furina_pipeline` is a English model originally trained by yihongLiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/furina_pipeline_en_5.5.0_3.0_1727209881517.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/furina_pipeline_en_5.5.0_3.0_1727209881517.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("furina_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("furina_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|furina_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/yihongLiu/furina + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-german_english_code_switching_bert_en.md b/docs/_posts/ahmedlone127/2024-09-24-german_english_code_switching_bert_en.md new file mode 100644 index 00000000000000..1a7a476fce42c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-german_english_code_switching_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English german_english_code_switching_bert BertEmbeddings from igorsterner +author: John Snow Labs +name: german_english_code_switching_bert +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`german_english_code_switching_bert` is a English model originally trained by igorsterner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/german_english_code_switching_bert_en_5.5.0_3.0_1727220815162.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/german_english_code_switching_bert_en_5.5.0_3.0_1727220815162.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("german_english_code_switching_bert","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("german_english_code_switching_bert","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|german_english_code_switching_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|664.7 MB| + +## References + +https://huggingface.co/igorsterner/german-english-code-switching-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-german_english_code_switching_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-german_english_code_switching_bert_pipeline_en.md new file mode 100644 index 00000000000000..a0917a247a894a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-german_english_code_switching_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English german_english_code_switching_bert_pipeline pipeline BertEmbeddings from igorsterner +author: John Snow Labs +name: german_english_code_switching_bert_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`german_english_code_switching_bert_pipeline` is a English model originally trained by igorsterner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/german_english_code_switching_bert_pipeline_en_5.5.0_3.0_1727220848520.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/german_english_code_switching_bert_pipeline_en_5.5.0_3.0_1727220848520.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("german_english_code_switching_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("german_english_code_switching_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|german_english_code_switching_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|664.7 MB| + +## References + +https://huggingface.co/igorsterner/german-english-code-switching-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-hate_hate_balance_random3_seed1_bernice_en.md b/docs/_posts/ahmedlone127/2024-09-24-hate_hate_balance_random3_seed1_bernice_en.md new file mode 100644 index 00000000000000..534ae9ee51f926 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-hate_hate_balance_random3_seed1_bernice_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_balance_random3_seed1_bernice XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random3_seed1_bernice +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random3_seed1_bernice` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed1_bernice_en_5.5.0_3.0_1727153045262.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed1_bernice_en_5.5.0_3.0_1727153045262.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hate_hate_balance_random3_seed1_bernice","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hate_hate_balance_random3_seed1_bernice", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random3_seed1_bernice| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|783.5 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random3_seed1-bernice \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-hate_hate_balance_random3_seed1_bernice_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-hate_hate_balance_random3_seed1_bernice_pipeline_en.md new file mode 100644 index 00000000000000..6654ba4b6ef657 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-hate_hate_balance_random3_seed1_bernice_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_balance_random3_seed1_bernice_pipeline pipeline XlmRoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_balance_random3_seed1_bernice_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_balance_random3_seed1_bernice_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed1_bernice_pipeline_en_5.5.0_3.0_1727153189704.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_balance_random3_seed1_bernice_pipeline_en_5.5.0_3.0_1727153189704.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_balance_random3_seed1_bernice_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_balance_random3_seed1_bernice_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_balance_random3_seed1_bernice_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|783.5 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_balance_random3_seed1-bernice + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-hate_hate_random1_seed2_twitter_roberta_base_2022_154m_en.md b/docs/_posts/ahmedlone127/2024-09-24-hate_hate_random1_seed2_twitter_roberta_base_2022_154m_en.md new file mode 100644 index 00000000000000..1f26fc8c80f02a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-hate_hate_random1_seed2_twitter_roberta_base_2022_154m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hate_hate_random1_seed2_twitter_roberta_base_2022_154m RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_random1_seed2_twitter_roberta_base_2022_154m +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_random1_seed2_twitter_roberta_base_2022_154m` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_random1_seed2_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1727171955647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_random1_seed2_twitter_roberta_base_2022_154m_en_5.5.0_3.0_1727171955647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_random1_seed2_twitter_roberta_base_2022_154m","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("hate_hate_random1_seed2_twitter_roberta_base_2022_154m", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_random1_seed2_twitter_roberta_base_2022_154m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_random1_seed2-twitter-roberta-base-2022-154m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-hate_hate_random1_seed2_twitter_roberta_base_2022_154m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-hate_hate_random1_seed2_twitter_roberta_base_2022_154m_pipeline_en.md new file mode 100644 index 00000000000000..f9b6205cd4474b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-hate_hate_random1_seed2_twitter_roberta_base_2022_154m_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English hate_hate_random1_seed2_twitter_roberta_base_2022_154m_pipeline pipeline RoBertaForSequenceClassification from tweettemposhift +author: John Snow Labs +name: hate_hate_random1_seed2_twitter_roberta_base_2022_154m_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_hate_random1_seed2_twitter_roberta_base_2022_154m_pipeline` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_hate_random1_seed2_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1727171978817.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_hate_random1_seed2_twitter_roberta_base_2022_154m_pipeline_en_5.5.0_3.0_1727171978817.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_hate_random1_seed2_twitter_roberta_base_2022_154m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_hate_random1_seed2_twitter_roberta_base_2022_154m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_hate_random1_seed2_twitter_roberta_base_2022_154m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.1 MB| + +## References + +https://huggingface.co/tweettemposhift/hate-hate_random1_seed2-twitter-roberta-base-2022-154m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-icebert_is.md b/docs/_posts/ahmedlone127/2024-09-24-icebert_is.md new file mode 100644 index 00000000000000..4083b773b0582d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-icebert_is.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Icelandic icebert RoBertaEmbeddings from mideind +author: John Snow Labs +name: icebert +date: 2024-09-24 +tags: [is, open_source, onnx, embeddings, roberta] +task: Embeddings +language: is +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`icebert` is a Icelandic model originally trained by mideind. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/icebert_is_5.5.0_3.0_1727216135268.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/icebert_is_5.5.0_3.0_1727216135268.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("icebert","is") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("icebert","is") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|icebert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|is| +|Size:|296.5 MB| + +## References + +https://huggingface.co/mideind/IceBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-icebert_pipeline_is.md b/docs/_posts/ahmedlone127/2024-09-24-icebert_pipeline_is.md new file mode 100644 index 00000000000000..2257169e92fe4b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-icebert_pipeline_is.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Icelandic icebert_pipeline pipeline RoBertaEmbeddings from mideind +author: John Snow Labs +name: icebert_pipeline +date: 2024-09-24 +tags: [is, open_source, pipeline, onnx] +task: Embeddings +language: is +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`icebert_pipeline` is a Icelandic model originally trained by mideind. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/icebert_pipeline_is_5.5.0_3.0_1727216223007.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/icebert_pipeline_is_5.5.0_3.0_1727216223007.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("icebert_pipeline", lang = "is") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("icebert_pipeline", lang = "is") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|icebert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|is| +|Size:|296.5 MB| + +## References + +https://huggingface.co/mideind/IceBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-indobert_lite_squad_id.md b/docs/_posts/ahmedlone127/2024-09-24-indobert_lite_squad_id.md new file mode 100644 index 00000000000000..d594f245983c8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-indobert_lite_squad_id.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Indonesian indobert_lite_squad BertForQuestionAnswering from Wikidepia +author: John Snow Labs +name: indobert_lite_squad +date: 2024-09-24 +tags: [id, open_source, onnx, question_answering, bert] +task: Question Answering +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`indobert_lite_squad` is a Indonesian model originally trained by Wikidepia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/indobert_lite_squad_id_5.5.0_3.0_1727206899323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/indobert_lite_squad_id_5.5.0_3.0_1727206899323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("indobert_lite_squad","id") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("indobert_lite_squad", "id") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|indobert_lite_squad| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|id| +|Size:|41.9 MB| + +## References + +https://huggingface.co/Wikidepia/indobert-lite-squad \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-indobert_lite_squad_pipeline_id.md b/docs/_posts/ahmedlone127/2024-09-24-indobert_lite_squad_pipeline_id.md new file mode 100644 index 00000000000000..c2b885ecca5f85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-indobert_lite_squad_pipeline_id.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Indonesian indobert_lite_squad_pipeline pipeline BertForQuestionAnswering from Wikidepia +author: John Snow Labs +name: indobert_lite_squad_pipeline +date: 2024-09-24 +tags: [id, open_source, pipeline, onnx] +task: Question Answering +language: id +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`indobert_lite_squad_pipeline` is a Indonesian model originally trained by Wikidepia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/indobert_lite_squad_pipeline_id_5.5.0_3.0_1727206901687.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/indobert_lite_squad_pipeline_id_5.5.0_3.0_1727206901687.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("indobert_lite_squad_pipeline", lang = "id") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("indobert_lite_squad_pipeline", lang = "id") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|indobert_lite_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|id| +|Size:|41.9 MB| + +## References + +https://huggingface.co/Wikidepia/indobert-lite-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-jmedroberta_base_sentencepiece_ja.md b/docs/_posts/ahmedlone127/2024-09-24-jmedroberta_base_sentencepiece_ja.md new file mode 100644 index 00000000000000..12adfbad41ad93 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-jmedroberta_base_sentencepiece_ja.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Japanese jmedroberta_base_sentencepiece BertEmbeddings from alabnii +author: John Snow Labs +name: jmedroberta_base_sentencepiece +date: 2024-09-24 +tags: [ja, open_source, onnx, embeddings, bert] +task: Embeddings +language: ja +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jmedroberta_base_sentencepiece` is a Japanese model originally trained by alabnii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jmedroberta_base_sentencepiece_ja_5.5.0_3.0_1727220944206.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jmedroberta_base_sentencepiece_ja_5.5.0_3.0_1727220944206.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("jmedroberta_base_sentencepiece","ja") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("jmedroberta_base_sentencepiece","ja") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jmedroberta_base_sentencepiece| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|ja| +|Size:|406.1 MB| + +## References + +https://huggingface.co/alabnii/jmedroberta-base-sentencepiece \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-kpfbert_korquad_1_en.md b/docs/_posts/ahmedlone127/2024-09-24-kpfbert_korquad_1_en.md new file mode 100644 index 00000000000000..196e6a669b836b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-kpfbert_korquad_1_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English kpfbert_korquad_1 BertForQuestionAnswering from eeeyounglee +author: John Snow Labs +name: kpfbert_korquad_1 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kpfbert_korquad_1` is a English model originally trained by eeeyounglee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kpfbert_korquad_1_en_5.5.0_3.0_1727176045436.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kpfbert_korquad_1_en_5.5.0_3.0_1727176045436.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("kpfbert_korquad_1","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("kpfbert_korquad_1", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kpfbert_korquad_1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|425.1 MB| + +## References + +https://huggingface.co/eeeyounglee/kpfbert-korquad-1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-kpfbert_korquad_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-kpfbert_korquad_1_pipeline_en.md new file mode 100644 index 00000000000000..7e2243585c1af3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-kpfbert_korquad_1_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English kpfbert_korquad_1_pipeline pipeline BertForQuestionAnswering from eeeyounglee +author: John Snow Labs +name: kpfbert_korquad_1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kpfbert_korquad_1_pipeline` is a English model originally trained by eeeyounglee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kpfbert_korquad_1_pipeline_en_5.5.0_3.0_1727176067447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kpfbert_korquad_1_pipeline_en_5.5.0_3.0_1727176067447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("kpfbert_korquad_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("kpfbert_korquad_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kpfbert_korquad_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|425.1 MB| + +## References + +https://huggingface.co/eeeyounglee/kpfbert-korquad-1 + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-legalbert_large_1_7m_2_class_actions_en.md b/docs/_posts/ahmedlone127/2024-09-24-legalbert_large_1_7m_2_class_actions_en.md new file mode 100644 index 00000000000000..cddba3e62a1f12 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-legalbert_large_1_7m_2_class_actions_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English legalbert_large_1_7m_2_class_actions BertForSequenceClassification from afsuarezg +author: John Snow Labs +name: legalbert_large_1_7m_2_class_actions +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legalbert_large_1_7m_2_class_actions` is a English model originally trained by afsuarezg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legalbert_large_1_7m_2_class_actions_en_5.5.0_3.0_1727221908635.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legalbert_large_1_7m_2_class_actions_en_5.5.0_3.0_1727221908635.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("legalbert_large_1_7m_2_class_actions","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("legalbert_large_1_7m_2_class_actions", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legalbert_large_1_7m_2_class_actions| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/afsuarezg/legalbert-large-1.7M-2_class_actions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-legalbert_large_1_7m_2_class_actions_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-legalbert_large_1_7m_2_class_actions_pipeline_en.md new file mode 100644 index 00000000000000..5396e465f0f928 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-legalbert_large_1_7m_2_class_actions_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English legalbert_large_1_7m_2_class_actions_pipeline pipeline BertForSequenceClassification from afsuarezg +author: John Snow Labs +name: legalbert_large_1_7m_2_class_actions_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legalbert_large_1_7m_2_class_actions_pipeline` is a English model originally trained by afsuarezg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legalbert_large_1_7m_2_class_actions_pipeline_en_5.5.0_3.0_1727221974056.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legalbert_large_1_7m_2_class_actions_pipeline_en_5.5.0_3.0_1727221974056.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("legalbert_large_1_7m_2_class_actions_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("legalbert_large_1_7m_2_class_actions_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legalbert_large_1_7m_2_class_actions_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/afsuarezg/legalbert-large-1.7M-2_class_actions + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-less_300000_xlm_roberta_mmar_recipe_10_en.md b/docs/_posts/ahmedlone127/2024-09-24-less_300000_xlm_roberta_mmar_recipe_10_en.md new file mode 100644 index 00000000000000..7ce9c45d9ec7c1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-less_300000_xlm_roberta_mmar_recipe_10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English less_300000_xlm_roberta_mmar_recipe_10 XlmRoBertaEmbeddings from CennetOguz +author: John Snow Labs +name: less_300000_xlm_roberta_mmar_recipe_10 +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, xlm_roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`less_300000_xlm_roberta_mmar_recipe_10` is a English model originally trained by CennetOguz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/less_300000_xlm_roberta_mmar_recipe_10_en_5.5.0_3.0_1727209434850.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/less_300000_xlm_roberta_mmar_recipe_10_en_5.5.0_3.0_1727209434850.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = XlmRoBertaEmbeddings.pretrained("less_300000_xlm_roberta_mmar_recipe_10","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = XlmRoBertaEmbeddings.pretrained("less_300000_xlm_roberta_mmar_recipe_10","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|less_300000_xlm_roberta_mmar_recipe_10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[xlm_roberta]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/CennetOguz/less_300000_xlm_roberta_mmar_recipe_10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-less_300000_xlm_roberta_mmar_recipe_10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-less_300000_xlm_roberta_mmar_recipe_10_pipeline_en.md new file mode 100644 index 00000000000000..de25cac91fa649 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-less_300000_xlm_roberta_mmar_recipe_10_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English less_300000_xlm_roberta_mmar_recipe_10_pipeline pipeline XlmRoBertaEmbeddings from CennetOguz +author: John Snow Labs +name: less_300000_xlm_roberta_mmar_recipe_10_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`less_300000_xlm_roberta_mmar_recipe_10_pipeline` is a English model originally trained by CennetOguz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/less_300000_xlm_roberta_mmar_recipe_10_pipeline_en_5.5.0_3.0_1727209488775.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/less_300000_xlm_roberta_mmar_recipe_10_pipeline_en_5.5.0_3.0_1727209488775.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("less_300000_xlm_roberta_mmar_recipe_10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("less_300000_xlm_roberta_mmar_recipe_10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|less_300000_xlm_roberta_mmar_recipe_10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/CennetOguz/less_300000_xlm_roberta_mmar_recipe_10 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-lnmt15_en.md b/docs/_posts/ahmedlone127/2024-09-24-lnmt15_en.md new file mode 100644 index 00000000000000..23a811e4d04498 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-lnmt15_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English lnmt15 DistilBertForSequenceClassification from carmenlozano +author: John Snow Labs +name: lnmt15 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`lnmt15` is a English model originally trained by carmenlozano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/lnmt15_en_5.5.0_3.0_1727154835592.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/lnmt15_en_5.5.0_3.0_1727154835592.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("lnmt15","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("lnmt15", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|lnmt15| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.6 MB| + +## References + +https://huggingface.co/carmenlozano/lnmt15 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-mefmqgve_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-mefmqgve_pipeline_en.md new file mode 100644 index 00000000000000..f9f49c85cde4a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-mefmqgve_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mefmqgve_pipeline pipeline DistilBertForSequenceClassification from chernandezc +author: John Snow Labs +name: mefmqgve_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mefmqgve_pipeline` is a English model originally trained by chernandezc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mefmqgve_pipeline_en_5.5.0_3.0_1727154653152.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mefmqgve_pipeline_en_5.5.0_3.0_1727154653152.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mefmqgve_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mefmqgve_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mefmqgve_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chernandezc/mefmqgve + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-mentbert_en.md b/docs/_posts/ahmedlone127/2024-09-24-mentbert_en.md new file mode 100644 index 00000000000000..285aba8923595f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-mentbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mentbert BertForSequenceClassification from reab5555 +author: John Snow Labs +name: mentbert +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mentbert` is a English model originally trained by reab5555. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mentbert_en_5.5.0_3.0_1727219025111.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mentbert_en_5.5.0_3.0_1727219025111.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("mentbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("mentbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mentbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/reab5555/mentBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-mentbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-mentbert_pipeline_en.md new file mode 100644 index 00000000000000..47077e1d5e43dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-mentbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mentbert_pipeline pipeline BertForSequenceClassification from reab5555 +author: John Snow Labs +name: mentbert_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mentbert_pipeline` is a English model originally trained by reab5555. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mentbert_pipeline_en_5.5.0_3.0_1727219051028.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mentbert_pipeline_en_5.5.0_3.0_1727219051028.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mentbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mentbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mentbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/reab5555/mentBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-model_one_ashleyinust_en.md b/docs/_posts/ahmedlone127/2024-09-24-model_one_ashleyinust_en.md new file mode 100644 index 00000000000000..8e108b8f8fda51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-model_one_ashleyinust_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English model_one_ashleyinust DistilBertForSequenceClassification from Ashleyinust +author: John Snow Labs +name: model_one_ashleyinust +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`model_one_ashleyinust` is a English model originally trained by Ashleyinust. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/model_one_ashleyinust_en_5.5.0_3.0_1727164141906.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/model_one_ashleyinust_en_5.5.0_3.0_1727164141906.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("model_one_ashleyinust","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("model_one_ashleyinust", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|model_one_ashleyinust| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Ashleyinust/model_one \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-modelocanal_es.md b/docs/_posts/ahmedlone127/2024-09-24-modelocanal_es.md new file mode 100644 index 00000000000000..fa9524fd9c2470 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-modelocanal_es.md @@ -0,0 +1,86 @@ +--- +layout: model +title: Castilian, Spanish modelocanal BertForQuestionAnswering from Antonio49 +author: John Snow Labs +name: modelocanal +date: 2024-09-24 +tags: [es, open_source, onnx, question_answering, bert] +task: Question Answering +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`modelocanal` is a Castilian, Spanish model originally trained by Antonio49. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/modelocanal_es_5.5.0_3.0_1727207066216.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/modelocanal_es_5.5.0_3.0_1727207066216.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("modelocanal","es") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("modelocanal", "es") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|modelocanal| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|es| +|Size:|409.7 MB| + +## References + +https://huggingface.co/Antonio49/ModeloCanal \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-modelocanal_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-24-modelocanal_pipeline_es.md new file mode 100644 index 00000000000000..3841510ad1d7f8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-modelocanal_pipeline_es.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Castilian, Spanish modelocanal_pipeline pipeline BertForQuestionAnswering from Antonio49 +author: John Snow Labs +name: modelocanal_pipeline +date: 2024-09-24 +tags: [es, open_source, pipeline, onnx] +task: Question Answering +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`modelocanal_pipeline` is a Castilian, Spanish model originally trained by Antonio49. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/modelocanal_pipeline_es_5.5.0_3.0_1727207087231.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/modelocanal_pipeline_es_5.5.0_3.0_1727207087231.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("modelocanal_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("modelocanal_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|modelocanal_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|409.7 MB| + +## References + +https://huggingface.co/Antonio49/ModeloCanal + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-multi_label_classification_venkatarajendra_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-multi_label_classification_venkatarajendra_pipeline_en.md new file mode 100644 index 00000000000000..67f5daaac44ab9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-multi_label_classification_venkatarajendra_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English multi_label_classification_venkatarajendra_pipeline pipeline RoBertaForSequenceClassification from venkatarajendra +author: John Snow Labs +name: multi_label_classification_venkatarajendra_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multi_label_classification_venkatarajendra_pipeline` is a English model originally trained by venkatarajendra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multi_label_classification_venkatarajendra_pipeline_en_5.5.0_3.0_1727171524292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multi_label_classification_venkatarajendra_pipeline_en_5.5.0_3.0_1727171524292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("multi_label_classification_venkatarajendra_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("multi_label_classification_venkatarajendra_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multi_label_classification_venkatarajendra_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|457.1 MB| + +## References + +https://huggingface.co/venkatarajendra/multi-label-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-ner_ner_random2_seed2_roberta_large_en.md b/docs/_posts/ahmedlone127/2024-09-24-ner_ner_random2_seed2_roberta_large_en.md new file mode 100644 index 00000000000000..a95a6d13cecc77 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-ner_ner_random2_seed2_roberta_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_ner_random2_seed2_roberta_large RoBertaForTokenClassification from tweettemposhift +author: John Snow Labs +name: ner_ner_random2_seed2_roberta_large +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_ner_random2_seed2_roberta_large` is a English model originally trained by tweettemposhift. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_ner_random2_seed2_roberta_large_en_5.5.0_3.0_1727151411205.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_ner_random2_seed2_roberta_large_en_5.5.0_3.0_1727151411205.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("ner_ner_random2_seed2_roberta_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("ner_ner_random2_seed2_roberta_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_ner_random2_seed2_roberta_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/tweettemposhift/ner-ner_random2_seed2-roberta-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-nlp2_base_3e_5_en.md b/docs/_posts/ahmedlone127/2024-09-24-nlp2_base_3e_5_en.md new file mode 100644 index 00000000000000..4234292ba2336b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-nlp2_base_3e_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nlp2_base_3e_5 DistilBertForSequenceClassification from VRT-2428211 +author: John Snow Labs +name: nlp2_base_3e_5 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp2_base_3e_5` is a English model originally trained by VRT-2428211. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp2_base_3e_5_en_5.5.0_3.0_1727154918792.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp2_base_3e_5_en_5.5.0_3.0_1727154918792.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp2_base_3e_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("nlp2_base_3e_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp2_base_3e_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/VRT-2428211/NLP2_Base_3e-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-nlp2_base_3e_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-nlp2_base_3e_5_pipeline_en.md new file mode 100644 index 00000000000000..4a990b965d2d66 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-nlp2_base_3e_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nlp2_base_3e_5_pipeline pipeline DistilBertForSequenceClassification from VRT-2428211 +author: John Snow Labs +name: nlp2_base_3e_5_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp2_base_3e_5_pipeline` is a English model originally trained by VRT-2428211. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp2_base_3e_5_pipeline_en_5.5.0_3.0_1727154932055.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp2_base_3e_5_pipeline_en_5.5.0_3.0_1727154932055.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nlp2_base_3e_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nlp2_base_3e_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp2_base_3e_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/VRT-2428211/NLP2_Base_3e-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-norwegian_repeats_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-norwegian_repeats_pipeline_en.md new file mode 100644 index 00000000000000..c9f67c7f15c673 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-norwegian_repeats_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English norwegian_repeats_pipeline pipeline XlmRoBertaForTokenClassification from grace-pro +author: John Snow Labs +name: norwegian_repeats_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norwegian_repeats_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norwegian_repeats_pipeline_en_5.5.0_3.0_1727174699382.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norwegian_repeats_pipeline_en_5.5.0_3.0_1727174699382.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("norwegian_repeats_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("norwegian_repeats_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norwegian_repeats_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/grace-pro/no_repeats + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_arabic_english_en.md b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_arabic_english_en.md new file mode 100644 index 00000000000000..4098b18d135b34 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_arabic_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opus_maltese_arabic_english MarianTransformer from finnstrom3693 +author: John Snow Labs +name: opus_maltese_arabic_english +date: 2024-09-24 +tags: [en, open_source, onnx, translation, marian] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_arabic_english` is a English model originally trained by finnstrom3693. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_arabic_english_en_5.5.0_3.0_1727166100814.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_arabic_english_en_5.5.0_3.0_1727166100814.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("opus_maltese_arabic_english","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("opus_maltese_arabic_english","en") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_arabic_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|en| +|Size:|335.5 MB| + +## References + +https://huggingface.co/finnstrom3693/opus-mt-ar-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_indonesian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_indonesian_pipeline_en.md new file mode 100644 index 00000000000000..a69b093ece070a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-opus_maltese_english_indonesian_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_maltese_english_indonesian_pipeline pipeline MarianTransformer from finnstrom3693 +author: John Snow Labs +name: opus_maltese_english_indonesian_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Translation +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_maltese_english_indonesian_pipeline` is a English model originally trained by finnstrom3693. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_maltese_english_indonesian_pipeline_en_5.5.0_3.0_1727166577448.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_maltese_english_indonesian_pipeline_en_5.5.0_3.0_1727166577448.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_maltese_english_indonesian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_maltese_english_indonesian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_maltese_english_indonesian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|307.8 MB| + +## References + +https://huggingface.co/finnstrom3693/opus-mt-en-id + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-output_sonyy_en.md b/docs/_posts/ahmedlone127/2024-09-24-output_sonyy_en.md new file mode 100644 index 00000000000000..77bb74b426b1ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-output_sonyy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English output_sonyy DistilBertForSequenceClassification from sonyy +author: John Snow Labs +name: output_sonyy +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`output_sonyy` is a English model originally trained by sonyy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/output_sonyy_en_5.5.0_3.0_1727164667739.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/output_sonyy_en_5.5.0_3.0_1727164667739.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("output_sonyy","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("output_sonyy", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|output_sonyy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/sonyy/output \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-patient_doctor_text_classifier_eng_en.md b/docs/_posts/ahmedlone127/2024-09-24-patient_doctor_text_classifier_eng_en.md new file mode 100644 index 00000000000000..d808054f1f8117 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-patient_doctor_text_classifier_eng_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English patient_doctor_text_classifier_eng DistilBertForSequenceClassification from LukeGPT88 +author: John Snow Labs +name: patient_doctor_text_classifier_eng +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`patient_doctor_text_classifier_eng` is a English model originally trained by LukeGPT88. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/patient_doctor_text_classifier_eng_en_5.5.0_3.0_1727204907822.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/patient_doctor_text_classifier_eng_en_5.5.0_3.0_1727204907822.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("patient_doctor_text_classifier_eng","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("patient_doctor_text_classifier_eng", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|patient_doctor_text_classifier_eng| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LukeGPT88/patient-doctor-text-classifier-eng \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-patient_doctor_text_classifier_eng_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-patient_doctor_text_classifier_eng_pipeline_en.md new file mode 100644 index 00000000000000..15709d42deeaf8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-patient_doctor_text_classifier_eng_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English patient_doctor_text_classifier_eng_pipeline pipeline DistilBertForSequenceClassification from LukeGPT88 +author: John Snow Labs +name: patient_doctor_text_classifier_eng_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`patient_doctor_text_classifier_eng_pipeline` is a English model originally trained by LukeGPT88. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/patient_doctor_text_classifier_eng_pipeline_en_5.5.0_3.0_1727204921327.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/patient_doctor_text_classifier_eng_pipeline_en_5.5.0_3.0_1727204921327.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("patient_doctor_text_classifier_eng_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("patient_doctor_text_classifier_eng_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|patient_doctor_text_classifier_eng_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/LukeGPT88/patient-doctor-text-classifier-eng + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-prueba4_en.md b/docs/_posts/ahmedlone127/2024-09-24-prueba4_en.md new file mode 100644 index 00000000000000..80f8a720a467d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-prueba4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English prueba4 RoBertaForSequenceClassification from Saul98lm +author: John Snow Labs +name: prueba4 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`prueba4` is a English model originally trained by Saul98lm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/prueba4_en_5.5.0_3.0_1727172004997.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/prueba4_en_5.5.0_3.0_1727172004997.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("prueba4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("prueba4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|prueba4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|308.6 MB| + +## References + +https://huggingface.co/Saul98lm/Prueba4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-python_code_comment_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-python_code_comment_classification_pipeline_en.md new file mode 100644 index 00000000000000..76e74dbcebf851 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-python_code_comment_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English python_code_comment_classification_pipeline pipeline BertEmbeddings from ZarahShibli +author: John Snow Labs +name: python_code_comment_classification_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`python_code_comment_classification_pipeline` is a English model originally trained by ZarahShibli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/python_code_comment_classification_pipeline_en_5.5.0_3.0_1727161855350.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/python_code_comment_classification_pipeline_en_5.5.0_3.0_1727161855350.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("python_code_comment_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("python_code_comment_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|python_code_comment_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.7 MB| + +## References + +https://huggingface.co/ZarahShibli/python-code-comment-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-quiltnet_b_16_en.md b/docs/_posts/ahmedlone127/2024-09-24-quiltnet_b_16_en.md new file mode 100644 index 00000000000000..e1a038cec6a2a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-quiltnet_b_16_en.md @@ -0,0 +1,120 @@ +--- +layout: model +title: English quiltnet_b_16 CLIPForZeroShotClassification from wisdomik +author: John Snow Labs +name: quiltnet_b_16 +date: 2024-09-24 +tags: [en, open_source, onnx, zero_shot, clip, image] +task: Zero-Shot Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: CLIPForZeroShotClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CLIPForZeroShotClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`quiltnet_b_16` is a English model originally trained by wisdomik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/quiltnet_b_16_en_5.5.0_3.0_1727207720261.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/quiltnet_b_16_en_5.5.0_3.0_1727207720261.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +imageDF = spark.read \ + .format("image") \ + .option("dropInvalid", value = True) \ + .load("src/test/resources/image/") + +candidateLabels = [ + "a photo of a bird", + "a photo of a cat", + "a photo of a dog", + "a photo of a hen", + "a photo of a hippo", + "a photo of a room", + "a photo of a tractor", + "a photo of an ostrich", + "a photo of an ox"] + +ImageAssembler = ImageAssembler() \ + .setInputCol("image") \ + .setOutputCol("image_assembler") + +imageClassifier = CLIPForZeroShotClassification.pretrained("quiltnet_b_16","en") \ + .setInputCols(["image_assembler"]) \ + .setOutputCol("label") \ + .setCandidateLabels(candidateLabels) + +pipeline = Pipeline().setStages([ImageAssembler, imageClassifier]) +pipelineModel = pipeline.fit(imageDF) +pipelineDF = pipelineModel.transform(imageDF) + + +``` +```scala + + +val imageDF = ResourceHelper.spark.read + .format("image") + .option("dropInvalid", value = true) + .load("src/test/resources/image/") + +val candidateLabels = Array( + "a photo of a bird", + "a photo of a cat", + "a photo of a dog", + "a photo of a hen", + "a photo of a hippo", + "a photo of a room", + "a photo of a tractor", + "a photo of an ostrich", + "a photo of an ox") + +val imageAssembler = new ImageAssembler() + .setInputCol("image") + .setOutputCol("image_assembler") + +val imageClassifier = CLIPForZeroShotClassification.pretrained("quiltnet_b_16","en") \ + .setInputCols(Array("image_assembler")) \ + .setOutputCol("label") \ + .setCandidateLabels(candidateLabels) + +val pipeline = new Pipeline().setStages(Array(imageAssembler, imageClassifier)) +val pipelineModel = pipeline.fit(imageDF) +val pipelineDF = pipelineModel.transform(imageDF) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|quiltnet_b_16| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[image_assembler]| +|Output Labels:|[label]| +|Language:|en| +|Size:|561.2 MB| + +## References + +https://huggingface.co/wisdomik/QuiltNet-B-16 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-quiltnet_b_16_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-quiltnet_b_16_pipeline_en.md new file mode 100644 index 00000000000000..0a49e7afa32910 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-quiltnet_b_16_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English quiltnet_b_16_pipeline pipeline CLIPForZeroShotClassification from wisdomik +author: John Snow Labs +name: quiltnet_b_16_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Zero-Shot Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained CLIPForZeroShotClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`quiltnet_b_16_pipeline` is a English model originally trained by wisdomik. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/quiltnet_b_16_pipeline_en_5.5.0_3.0_1727207751751.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/quiltnet_b_16_pipeline_en_5.5.0_3.0_1727207751751.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("quiltnet_b_16_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("quiltnet_b_16_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|quiltnet_b_16_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|561.2 MB| + +## References + +https://huggingface.co/wisdomik/QuiltNet-B-16 + +## Included Models + +- ImageAssembler +- CLIPForZeroShotClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-rejection_detection_en.md b/docs/_posts/ahmedlone127/2024-09-24-rejection_detection_en.md new file mode 100644 index 00000000000000..5580bf829837e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-rejection_detection_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English rejection_detection RoBertaForSequenceClassification from holistic-ai +author: John Snow Labs +name: rejection_detection +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rejection_detection` is a English model originally trained by holistic-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rejection_detection_en_5.5.0_3.0_1727211994868.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rejection_detection_en_5.5.0_3.0_1727211994868.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("rejection_detection","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("rejection_detection", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rejection_detection| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|309.0 MB| + +## References + +https://huggingface.co/holistic-ai/rejection_detection \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-rejection_detection_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-rejection_detection_pipeline_en.md new file mode 100644 index 00000000000000..d99e8706b7d3ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-rejection_detection_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English rejection_detection_pipeline pipeline RoBertaForSequenceClassification from holistic-ai +author: John Snow Labs +name: rejection_detection_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rejection_detection_pipeline` is a English model originally trained by holistic-ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rejection_detection_pipeline_en_5.5.0_3.0_1727212010929.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rejection_detection_pipeline_en_5.5.0_3.0_1727212010929.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rejection_detection_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rejection_detection_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rejection_detection_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|309.1 MB| + +## References + +https://huggingface.co/holistic-ai/rejection_detection + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-repo_31_5_mlops_zh0rg_en.md b/docs/_posts/ahmedlone127/2024-09-24-repo_31_5_mlops_zh0rg_en.md new file mode 100644 index 00000000000000..f0e3c483717086 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-repo_31_5_mlops_zh0rg_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English repo_31_5_mlops_zh0rg DistilBertForSequenceClassification from Zh0rg +author: John Snow Labs +name: repo_31_5_mlops_zh0rg +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`repo_31_5_mlops_zh0rg` is a English model originally trained by Zh0rg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/repo_31_5_mlops_zh0rg_en_5.5.0_3.0_1727154615680.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/repo_31_5_mlops_zh0rg_en_5.5.0_3.0_1727154615680.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("repo_31_5_mlops_zh0rg","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("repo_31_5_mlops_zh0rg", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|repo_31_5_mlops_zh0rg| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Zh0rg/repo-31-5-MLOps \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-results_deberta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-results_deberta_pipeline_en.md new file mode 100644 index 00000000000000..b9dd93af5b0b57 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-results_deberta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English results_deberta_pipeline pipeline DeBertaForSequenceClassification from Siddartha10 +author: John Snow Labs +name: results_deberta_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DeBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`results_deberta_pipeline` is a English model originally trained by Siddartha10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/results_deberta_pipeline_en_5.5.0_3.0_1727162483793.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/results_deberta_pipeline_en_5.5.0_3.0_1727162483793.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("results_deberta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("results_deberta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|results_deberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|641.0 MB| + +## References + +https://huggingface.co/Siddartha10/results_deberta + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DeBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-robbert_v2_dutch_base_finetuned_emotion_en.md b/docs/_posts/ahmedlone127/2024-09-24-robbert_v2_dutch_base_finetuned_emotion_en.md new file mode 100644 index 00000000000000..ba351b66968ff1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-robbert_v2_dutch_base_finetuned_emotion_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robbert_v2_dutch_base_finetuned_emotion RoBertaForSequenceClassification from antalvdb +author: John Snow Labs +name: robbert_v2_dutch_base_finetuned_emotion +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_v2_dutch_base_finetuned_emotion` is a English model originally trained by antalvdb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_v2_dutch_base_finetuned_emotion_en_5.5.0_3.0_1727211566914.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_v2_dutch_base_finetuned_emotion_en_5.5.0_3.0_1727211566914.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("robbert_v2_dutch_base_finetuned_emotion","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("robbert_v2_dutch_base_finetuned_emotion", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_v2_dutch_base_finetuned_emotion| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|437.9 MB| + +## References + +https://huggingface.co/antalvdb/robbert-v2-dutch-base-finetuned-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-robbert_v2_dutch_base_finetuned_emotion_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-robbert_v2_dutch_base_finetuned_emotion_pipeline_en.md new file mode 100644 index 00000000000000..3aac848a351caf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-robbert_v2_dutch_base_finetuned_emotion_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robbert_v2_dutch_base_finetuned_emotion_pipeline pipeline RoBertaForSequenceClassification from antalvdb +author: John Snow Labs +name: robbert_v2_dutch_base_finetuned_emotion_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robbert_v2_dutch_base_finetuned_emotion_pipeline` is a English model originally trained by antalvdb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robbert_v2_dutch_base_finetuned_emotion_pipeline_en_5.5.0_3.0_1727211590894.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robbert_v2_dutch_base_finetuned_emotion_pipeline_en_5.5.0_3.0_1727211590894.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robbert_v2_dutch_base_finetuned_emotion_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robbert_v2_dutch_base_finetuned_emotion_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robbert_v2_dutch_base_finetuned_emotion_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|438.0 MB| + +## References + +https://huggingface.co/antalvdb/robbert-v2-dutch-base-finetuned-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_bne_capitel_ner_plantl_gob_es_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_bne_capitel_ner_plantl_gob_es_pipeline_es.md new file mode 100644 index 00000000000000..632a283276027f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_bne_capitel_ner_plantl_gob_es_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish roberta_base_bne_capitel_ner_plantl_gob_es_pipeline pipeline RoBertaForTokenClassification from PlanTL-GOB-ES +author: John Snow Labs +name: roberta_base_bne_capitel_ner_plantl_gob_es_pipeline +date: 2024-09-24 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_bne_capitel_ner_plantl_gob_es_pipeline` is a Castilian, Spanish model originally trained by PlanTL-GOB-ES. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_bne_capitel_ner_plantl_gob_es_pipeline_es_5.5.0_3.0_1727198929333.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_bne_capitel_ner_plantl_gob_es_pipeline_es_5.5.0_3.0_1727198929333.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_bne_capitel_ner_plantl_gob_es_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_bne_capitel_ner_plantl_gob_es_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_bne_capitel_ner_plantl_gob_es_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|456.6 MB| + +## References + +https://huggingface.co/PlanTL-GOB-ES/roberta-base-bne-capitel-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_53_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_53_en.md new file mode 100644 index 00000000000000..d315e6ef3d5fa1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_53_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_epoch_53 RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_53 +date: 2024-09-24 +tags: [en, open_source, onnx, embeddings, roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_53` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_53_en_5.5.0_3.0_1727168834976.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_53_en_5.5.0_3.0_1727168834976.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_53","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("roberta_base_epoch_53","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_53| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_53 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_53_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_53_pipeline_en.md new file mode 100644 index 00000000000000..02347dfcb08b70 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_epoch_53_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_epoch_53_pipeline pipeline RoBertaEmbeddings from yanaiela +author: John Snow Labs +name: roberta_base_epoch_53_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_epoch_53_pipeline` is a English model originally trained by yanaiela. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_53_pipeline_en_5.5.0_3.0_1727168921993.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_epoch_53_pipeline_en_5.5.0_3.0_1727168921993.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_epoch_53_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_epoch_53_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_epoch_53_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|297.3 MB| + +## References + +https://huggingface.co/yanaiela/roberta-base-epoch_53 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_legal_multi_downstream_indian_ner_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_legal_multi_downstream_indian_ner_en.md new file mode 100644 index 00000000000000..8e4dad798426f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_legal_multi_downstream_indian_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_legal_multi_downstream_indian_ner RoBertaForTokenClassification from MHGanainy +author: John Snow Labs +name: roberta_base_legal_multi_downstream_indian_ner +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_legal_multi_downstream_indian_ner` is a English model originally trained by MHGanainy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_legal_multi_downstream_indian_ner_en_5.5.0_3.0_1727195286050.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_legal_multi_downstream_indian_ner_en_5.5.0_3.0_1727195286050.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_legal_multi_downstream_indian_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_base_legal_multi_downstream_indian_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_legal_multi_downstream_indian_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/MHGanainy/roberta-base-legal-multi-downstream-indian-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_legal_multi_downstream_indian_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_legal_multi_downstream_indian_ner_pipeline_en.md new file mode 100644 index 00000000000000..25de030ffe5767 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_legal_multi_downstream_indian_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_base_legal_multi_downstream_indian_ner_pipeline pipeline RoBertaForTokenClassification from MHGanainy +author: John Snow Labs +name: roberta_base_legal_multi_downstream_indian_ner_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_legal_multi_downstream_indian_ner_pipeline` is a English model originally trained by MHGanainy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_legal_multi_downstream_indian_ner_pipeline_en_5.5.0_3.0_1727195309069.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_legal_multi_downstream_indian_ner_pipeline_en_5.5.0_3.0_1727195309069.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_base_legal_multi_downstream_indian_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_base_legal_multi_downstream_indian_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_legal_multi_downstream_indian_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.4 MB| + +## References + +https://huggingface.co/MHGanainy/roberta-base-legal-multi-downstream-indian-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_base_ours_rundi_2_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_ours_rundi_2_en.md new file mode 100644 index 00000000000000..e1009198313143 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_base_ours_rundi_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_base_ours_rundi_2 RoBertaForSequenceClassification from SkyR +author: John Snow Labs +name: roberta_base_ours_rundi_2 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_base_ours_rundi_2` is a English model originally trained by SkyR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_base_ours_rundi_2_en_5.5.0_3.0_1727172175283.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_base_ours_rundi_2_en_5.5.0_3.0_1727172175283.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_ours_rundi_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_base_ours_rundi_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_base_ours_rundi_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|429.2 MB| + +## References + +https://huggingface.co/SkyR/roberta-base-ours-run-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_ganda_cased_malay_ner_v2_test_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_ganda_cased_malay_ner_v2_test_en.md new file mode 100644 index 00000000000000..56b701243fc978 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_ganda_cased_malay_ner_v2_test_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_ganda_cased_malay_ner_v2_test RoBertaForTokenClassification from nxaliao +author: John Snow Labs +name: roberta_ganda_cased_malay_ner_v2_test +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_ganda_cased_malay_ner_v2_test` is a English model originally trained by nxaliao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_ganda_cased_malay_ner_v2_test_en_5.5.0_3.0_1727151284445.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_ganda_cased_malay_ner_v2_test_en_5.5.0_3.0_1727151284445.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_ganda_cased_malay_ner_v2_test","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_ganda_cased_malay_ner_v2_test", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_ganda_cased_malay_ner_v2_test| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/nxaliao/roberta-lg-cased-ms-ner-v2-test \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_large_bc4chemd_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_bc4chemd_en.md new file mode 100644 index 00000000000000..cf99c41a72ce41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_bc4chemd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_bc4chemd RoBertaForTokenClassification from CheccoCando +author: John Snow Labs +name: roberta_large_bc4chemd +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_bc4chemd` is a English model originally trained by CheccoCando. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_bc4chemd_en_5.5.0_3.0_1727150722493.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_bc4chemd_en_5.5.0_3.0_1727150722493.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_bc4chemd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_bc4chemd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_bc4chemd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/CheccoCando/roberta-large_bc4chemd \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_en.md new file mode 100644 index 00000000000000..30c13686b4352d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil RoBertaForTokenClassification from gundapusunil +author: John Snow Labs +name: roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil` is a English model originally trained by gundapusunil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_en_5.5.0_3.0_1727139436438.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_en_5.5.0_3.0_1727139436438.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/gundapusunil/RoBERTa-large-PM-M3-Voc-hf-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_pipeline_en.md new file mode 100644 index 00000000000000..bef9b67662e864 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_pipeline pipeline RoBertaForTokenClassification from gundapusunil +author: John Snow Labs +name: roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_pipeline` is a English model originally trained by gundapusunil. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_pipeline_en_5.5.0_3.0_1727139516513.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_pipeline_en_5.5.0_3.0_1727139516513.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_large_pm_m3_voc_hf_finetuned_ner_gundapusunil_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/gundapusunil/RoBERTa-large-PM-M3-Voc-hf-finetuned-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_dpr_nq_reader_roberta_base_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_dpr_nq_reader_roberta_base_en.md new file mode 100644 index 00000000000000..82764907a8ae99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_dpr_nq_reader_roberta_base_en.md @@ -0,0 +1,106 @@ +--- +layout: model +title: English RobertaForQuestionAnswering (from nlpconnect) +author: John Snow Labs +name: roberta_qa_dpr_nq_reader_roberta_base +date: 2024-09-24 +tags: [en, open_source, question_answering, roberta, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Question Answering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `dpr-nq-reader-roberta-base` is a English model originally trained by `nlpconnect`. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_qa_dpr_nq_reader_roberta_base_en_5.5.0_3.0_1727210947078.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_qa_dpr_nq_reader_roberta_base_en_5.5.0_3.0_1727210947078.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = MultiDocumentAssembler() \ +.setInputCols(["question", "context"]) \ +.setOutputCols(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_qa_dpr_nq_reader_roberta_base","en") \ +.setInputCols(["document_question", "document_context"]) \ +.setOutputCol("answer") \ +.setCaseSensitive(True) + +pipeline = Pipeline().setStages([ +document_assembler, +spanClassifier +]) + +example = spark.createDataFrame([["What's my name?", "My name is Clara and I live in Berkeley."]]).toDF("question", "context") + +result = pipeline.fit(example).transform(example) +``` +```scala +val document = new MultiDocumentAssembler() +.setInputCols("question", "context") +.setOutputCols("document_question", "document_context") + +val spanClassifier = RoBertaForQuestionAnswering +.pretrained("roberta_qa_dpr_nq_reader_roberta_base","en") +.setInputCols(Array("document_question", "document_context")) +.setOutputCol("answer") +.setCaseSensitive(true) +.setMaxSentenceLength(512) + +val pipeline = new Pipeline().setStages(Array(document, spanClassifier)) + +val example = Seq( +("Where was John Lenon born?", "John Lenon was born in London and lived in Paris. My name is Sarah and I live in London."), +("What's my name?", "My name is Clara and I live in Berkeley.")) +.toDF("question", "context") + +val result = pipeline.fit(example).transform(example) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.answer_question.roberta.base.by_nlpconnect").predict("""What's my name?|||"My name is Clara and I live in Berkeley.""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_qa_dpr_nq_reader_roberta_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|465.6 MB| + +## References + +References + +- https://huggingface.co/nlpconnect/dpr-nq-reader-roberta-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_dpr_nq_reader_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_dpr_nq_reader_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..6bffe1dc109ea5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_dpr_nq_reader_roberta_base_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_qa_dpr_nq_reader_roberta_base_pipeline pipeline RoBertaForQuestionAnswering from nlpconnect +author: John Snow Labs +name: roberta_qa_dpr_nq_reader_roberta_base_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_qa_dpr_nq_reader_roberta_base_pipeline` is a English model originally trained by nlpconnect. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_qa_dpr_nq_reader_roberta_base_pipeline_en_5.5.0_3.0_1727210971188.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_qa_dpr_nq_reader_roberta_base_pipeline_en_5.5.0_3.0_1727210971188.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_qa_dpr_nq_reader_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_qa_dpr_nq_reader_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_qa_dpr_nq_reader_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.6 MB| + +## References + +https://huggingface.co/nlpconnect/dpr-nq-reader-roberta-base + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_finetuned_state_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_finetuned_state_pipeline_en.md new file mode 100644 index 00000000000000..c874d576fd4bf5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_finetuned_state_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English roberta_qa_finetuned_state_pipeline pipeline RoBertaForQuestionAnswering from skandaonsolve +author: John Snow Labs +name: roberta_qa_finetuned_state_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_qa_finetuned_state_pipeline` is a English model originally trained by skandaonsolve. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_qa_finetuned_state_pipeline_en_5.5.0_3.0_1727211015905.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_qa_finetuned_state_pipeline_en_5.5.0_3.0_1727211015905.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_qa_finetuned_state_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_qa_finetuned_state_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_qa_finetuned_state_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|463.8 MB| + +## References + +https://huggingface.co/skandaonsolve/roberta-finetuned-state + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_quales_iberlef_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_quales_iberlef_en.md new file mode 100644 index 00000000000000..6992c4efec1b28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_qa_quales_iberlef_en.md @@ -0,0 +1,106 @@ +--- +layout: model +title: English RobertaForQuestionAnswering (from stevemobs) +author: John Snow Labs +name: roberta_qa_quales_iberlef +date: 2024-09-24 +tags: [en, open_source, question_answering, roberta, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained Question Answering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `quales-iberlef` is a English model originally trained by `stevemobs`. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_qa_quales_iberlef_en_5.5.0_3.0_1727210853804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_qa_quales_iberlef_en_5.5.0_3.0_1727210853804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = MultiDocumentAssembler() \ +.setInputCols(["question", "context"]) \ +.setOutputCols(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("roberta_qa_quales_iberlef","en") \ +.setInputCols(["document_question", "document_context"]) \ +.setOutputCol("answer") \ +.setCaseSensitive(True) + +pipeline = Pipeline().setStages([ +document_assembler, +spanClassifier +]) + +example = spark.createDataFrame([["What's my name?", "My name is Clara and I live in Berkeley."]]).toDF("question", "context") + +result = pipeline.fit(example).transform(example) +``` +```scala +val document = new MultiDocumentAssembler() +.setInputCols("question", "context") +.setOutputCols("document_question", "document_context") + +val spanClassifier = RoBertaForQuestionAnswering +.pretrained("roberta_qa_quales_iberlef","en") +.setInputCols(Array("document_question", "document_context")) +.setOutputCol("answer") +.setCaseSensitive(true) +.setMaxSentenceLength(512) + +val pipeline = new Pipeline().setStages(Array(document, spanClassifier)) + +val example = Seq( +("Where was John Lenon born?", "John Lenon was born in London and lived in Paris. My name is Sarah and I live in London."), +("What's my name?", "My name is Clara and I live in Berkeley.")) +.toDF("question", "context") + +val result = pipeline.fit(example).transform(example) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.answer_question.roberta.by_stevemobs").predict("""What's my name?|||"My name is Clara and I live in Berkeley.""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_qa_quales_iberlef| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|1.3 GB| + +## References + +References + +- https://huggingface.co/stevemobs/quales-iberlef \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_transfer2_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_transfer2_en.md new file mode 100644 index 00000000000000..ced9224d24576b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_transfer2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_transfer2 RoBertaForSequenceClassification from SOUMYADEEPSAR +author: John Snow Labs +name: roberta_transfer2 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_transfer2` is a English model originally trained by SOUMYADEEPSAR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_transfer2_en_5.5.0_3.0_1727171890209.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_transfer2_en_5.5.0_3.0_1727171890209.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_transfer2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("roberta_transfer2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_transfer2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|435.7 MB| + +## References + +https://huggingface.co/SOUMYADEEPSAR/roberta_transfer2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-roberta_transfer2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-roberta_transfer2_pipeline_en.md new file mode 100644 index 00000000000000..ad754fddc8b261 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-roberta_transfer2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_transfer2_pipeline pipeline RoBertaForSequenceClassification from SOUMYADEEPSAR +author: John Snow Labs +name: roberta_transfer2_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_transfer2_pipeline` is a English model originally trained by SOUMYADEEPSAR. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_transfer2_pipeline_en_5.5.0_3.0_1727171925260.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_transfer2_pipeline_en_5.5.0_3.0_1727171925260.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_transfer2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_transfer2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_transfer2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|435.7 MB| + +## References + +https://huggingface.co/SOUMYADEEPSAR/roberta_transfer2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-robertalarge_finetuned_winogrande_en.md b/docs/_posts/ahmedlone127/2024-09-24-robertalarge_finetuned_winogrande_en.md new file mode 100644 index 00000000000000..6538b9c2a1038e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-robertalarge_finetuned_winogrande_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robertalarge_finetuned_winogrande RoBertaForSequenceClassification from Kalslice +author: John Snow Labs +name: robertalarge_finetuned_winogrande +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robertalarge_finetuned_winogrande` is a English model originally trained by Kalslice. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robertalarge_finetuned_winogrande_en_5.5.0_3.0_1727167625898.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robertalarge_finetuned_winogrande_en_5.5.0_3.0_1727167625898.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("robertalarge_finetuned_winogrande","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("robertalarge_finetuned_winogrande", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robertalarge_finetuned_winogrande| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/Kalslice/robertalarge-finetuned-winogrande \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-rubert_conversational_cased_sentiment_pipeline_ru.md b/docs/_posts/ahmedlone127/2024-09-24-rubert_conversational_cased_sentiment_pipeline_ru.md new file mode 100644 index 00000000000000..461fb390820984 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-rubert_conversational_cased_sentiment_pipeline_ru.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Russian rubert_conversational_cased_sentiment_pipeline pipeline BertForSequenceClassification from MonoHime +author: John Snow Labs +name: rubert_conversational_cased_sentiment_pipeline +date: 2024-09-24 +tags: [ru, open_source, pipeline, onnx] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_conversational_cased_sentiment_pipeline` is a Russian model originally trained by MonoHime. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_conversational_cased_sentiment_pipeline_ru_5.5.0_3.0_1727214205347.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_conversational_cased_sentiment_pipeline_ru_5.5.0_3.0_1727214205347.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rubert_conversational_cased_sentiment_pipeline", lang = "ru") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rubert_conversational_cased_sentiment_pipeline", lang = "ru") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_conversational_cased_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ru| +|Size:|664.5 MB| + +## References + +https://huggingface.co/MonoHime/rubert_conversational_cased_sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-rubert_conversational_cased_sentiment_ru.md b/docs/_posts/ahmedlone127/2024-09-24-rubert_conversational_cased_sentiment_ru.md new file mode 100644 index 00000000000000..286c51a0aa3cd5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-rubert_conversational_cased_sentiment_ru.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Russian rubert_conversational_cased_sentiment BertForSequenceClassification from MonoHime +author: John Snow Labs +name: rubert_conversational_cased_sentiment +date: 2024-09-24 +tags: [ru, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_conversational_cased_sentiment` is a Russian model originally trained by MonoHime. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_conversational_cased_sentiment_ru_5.5.0_3.0_1727214171413.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_conversational_cased_sentiment_ru_5.5.0_3.0_1727214171413.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("rubert_conversational_cased_sentiment","ru") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("rubert_conversational_cased_sentiment", "ru") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_conversational_cased_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|ru| +|Size:|664.4 MB| + +## References + +https://huggingface.co/MonoHime/rubert_conversational_cased_sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-rubert_sentence_similarity_en.md b/docs/_posts/ahmedlone127/2024-09-24-rubert_sentence_similarity_en.md new file mode 100644 index 00000000000000..dd2261ab253528 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-rubert_sentence_similarity_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English rubert_sentence_similarity BertForSequenceClassification from AlanRobotics +author: John Snow Labs +name: rubert_sentence_similarity +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_sentence_similarity` is a English model originally trained by AlanRobotics. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_sentence_similarity_en_5.5.0_3.0_1727219170114.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_sentence_similarity_en_5.5.0_3.0_1727219170114.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("rubert_sentence_similarity","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("rubert_sentence_similarity", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_sentence_similarity| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|666.5 MB| + +## References + +https://huggingface.co/AlanRobotics/rubert-sentence-similarity \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-rubert_sentence_similarity_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-rubert_sentence_similarity_pipeline_en.md new file mode 100644 index 00000000000000..e9aadb3ea6c9ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-rubert_sentence_similarity_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English rubert_sentence_similarity_pipeline pipeline BertForSequenceClassification from AlanRobotics +author: John Snow Labs +name: rubert_sentence_similarity_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_sentence_similarity_pipeline` is a English model originally trained by AlanRobotics. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_sentence_similarity_pipeline_en_5.5.0_3.0_1727219203780.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_sentence_similarity_pipeline_en_5.5.0_3.0_1727219203780.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rubert_sentence_similarity_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rubert_sentence_similarity_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_sentence_similarity_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|666.5 MB| + +## References + +https://huggingface.co/AlanRobotics/rubert-sentence-similarity + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-securebert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-securebert_pipeline_en.md new file mode 100644 index 00000000000000..66b6727225c9ad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-securebert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English securebert_pipeline pipeline RoBertaEmbeddings from ehsanaghaei +author: John Snow Labs +name: securebert_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`securebert_pipeline` is a English model originally trained by ehsanaghaei. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/securebert_pipeline_en_5.5.0_3.0_1727216344948.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/securebert_pipeline_en_5.5.0_3.0_1727216344948.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("securebert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("securebert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|securebert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|466.1 MB| + +## References + +https://huggingface.co/ehsanaghaei/SecureBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_afro_xlmr_base_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-24-sent_afro_xlmr_base_pipeline_xx.md new file mode 100644 index 00000000000000..3e06443b83d6b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_afro_xlmr_base_pipeline_xx.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Multilingual sent_afro_xlmr_base_pipeline pipeline XlmRoBertaSentenceEmbeddings from Davlan +author: John Snow Labs +name: sent_afro_xlmr_base_pipeline +date: 2024-09-24 +tags: [xx, open_source, pipeline, onnx] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_afro_xlmr_base_pipeline` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_afro_xlmr_base_pipeline_xx_5.5.0_3.0_1727205840413.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_afro_xlmr_base_pipeline_xx_5.5.0_3.0_1727205840413.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_afro_xlmr_base_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_afro_xlmr_base_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_afro_xlmr_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Davlan/afro-xlmr-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- XlmRoBertaSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_afro_xlmr_base_xx.md b/docs/_posts/ahmedlone127/2024-09-24-sent_afro_xlmr_base_xx.md new file mode 100644 index 00000000000000..db2df7cd71fa08 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_afro_xlmr_base_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual sent_afro_xlmr_base XlmRoBertaSentenceEmbeddings from Davlan +author: John Snow Labs +name: sent_afro_xlmr_base +date: 2024-09-24 +tags: [xx, open_source, onnx, sentence_embeddings, xlm_roberta] +task: Embeddings +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_afro_xlmr_base` is a Multilingual model originally trained by Davlan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_afro_xlmr_base_xx_5.5.0_3.0_1727205787447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_afro_xlmr_base_xx_5.5.0_3.0_1727205787447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_afro_xlmr_base","xx") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_afro_xlmr_base","xx") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_afro_xlmr_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|xx| +|Size:|1.0 GB| + +## References + +https://huggingface.co/Davlan/afro-xlmr-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_batteryscibert_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_batteryscibert_uncased_en.md new file mode 100644 index 00000000000000..e2f15777a5d8ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_batteryscibert_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_batteryscibert_uncased BertSentenceEmbeddings from batterydata +author: John Snow Labs +name: sent_batteryscibert_uncased +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_batteryscibert_uncased` is a English model originally trained by batterydata. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_batteryscibert_uncased_en_5.5.0_3.0_1727202645434.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_batteryscibert_uncased_en_5.5.0_3.0_1727202645434.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_batteryscibert_uncased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_batteryscibert_uncased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_batteryscibert_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/batterydata/batteryscibert-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_batteryscibert_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_batteryscibert_uncased_pipeline_en.md new file mode 100644 index 00000000000000..0182dce4114fb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_batteryscibert_uncased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_batteryscibert_uncased_pipeline pipeline BertSentenceEmbeddings from batterydata +author: John Snow Labs +name: sent_batteryscibert_uncased_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_batteryscibert_uncased_pipeline` is a English model originally trained by batterydata. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_batteryscibert_uncased_pipeline_en_5.5.0_3.0_1727202666128.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_batteryscibert_uncased_pipeline_en_5.5.0_3.0_1727202666128.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_batteryscibert_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_batteryscibert_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_batteryscibert_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|410.5 MB| + +## References + +https://huggingface.co/batterydata/batteryscibert-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabert_finetuned_mdeberta_tswana_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabert_finetuned_mdeberta_tswana_en.md new file mode 100644 index 00000000000000..4a0f757c1e27ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabert_finetuned_mdeberta_tswana_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_arabert_finetuned_mdeberta_tswana BertSentenceEmbeddings from betteib +author: John Snow Labs +name: sent_bert_base_arabert_finetuned_mdeberta_tswana +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_arabert_finetuned_mdeberta_tswana` is a English model originally trained by betteib. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabert_finetuned_mdeberta_tswana_en_5.5.0_3.0_1727202340808.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabert_finetuned_mdeberta_tswana_en_5.5.0_3.0_1727202340808.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_arabert_finetuned_mdeberta_tswana","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_arabert_finetuned_mdeberta_tswana","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_arabert_finetuned_mdeberta_tswana| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|504.6 MB| + +## References + +https://huggingface.co/betteib/bert-base-arabert-finetuned-mdeberta-tn \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_ar.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_ar.md new file mode 100644 index 00000000000000..5a2869fc5a660a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic sent_bert_base_arabertv02 BertSentenceEmbeddings from aubmindlab +author: John Snow Labs +name: sent_bert_base_arabertv02 +date: 2024-09-24 +tags: [ar, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_arabertv02` is a Arabic model originally trained by aubmindlab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabertv02_ar_5.5.0_3.0_1727202120487.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabertv02_ar_5.5.0_3.0_1727202120487.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_arabertv02","ar") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_arabertv02","ar") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_arabertv02| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ar| +|Size:|505.1 MB| + +## References + +https://huggingface.co/aubmindlab/bert-base-arabertv02 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_pipeline_ar.md new file mode 100644 index 00000000000000..33620917c1020b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_arabertv02_pipeline_ar.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Arabic sent_bert_base_arabertv02_pipeline pipeline BertSentenceEmbeddings from aubmindlab +author: John Snow Labs +name: sent_bert_base_arabertv02_pipeline +date: 2024-09-24 +tags: [ar, open_source, pipeline, onnx] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_arabertv02_pipeline` is a Arabic model originally trained by aubmindlab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabertv02_pipeline_ar_5.5.0_3.0_1727202148000.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabertv02_pipeline_ar_5.5.0_3.0_1727202148000.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_arabertv02_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_arabertv02_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_arabertv02_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|505.6 MB| + +## References + +https://huggingface.co/aubmindlab/bert-base-arabertv02 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_blbooks_cased_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_blbooks_cased_en.md new file mode 100644 index 00000000000000..d80eed60f0fec9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_blbooks_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_blbooks_cased BertSentenceEmbeddings from bigscience-historical-texts +author: John Snow Labs +name: sent_bert_base_blbooks_cased +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_blbooks_cased` is a English model originally trained by bigscience-historical-texts. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_blbooks_cased_en_5.5.0_3.0_1727157729173.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_blbooks_cased_en_5.5.0_3.0_1727157729173.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_blbooks_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_blbooks_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_blbooks_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|412.3 MB| + +## References + +https://huggingface.co/bigscience-historical-texts/bert-base-blbooks-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_english_french_dutch_russian_arabic_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_english_french_dutch_russian_arabic_cased_pipeline_en.md new file mode 100644 index 00000000000000..e18ace2311611f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_english_french_dutch_russian_arabic_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_french_dutch_russian_arabic_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_french_dutch_russian_arabic_cased_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_french_dutch_russian_arabic_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_french_dutch_russian_arabic_cased_pipeline_en_5.5.0_3.0_1727202207349.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_french_dutch_russian_arabic_cased_pipeline_en_5.5.0_3.0_1727202207349.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_french_dutch_russian_arabic_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_french_dutch_russian_arabic_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_french_dutch_russian_arabic_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|462.1 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-fr-nl-ru-ar-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_nli_stsb_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_nli_stsb_en.md new file mode 100644 index 00000000000000..f50e41087ca203 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_nli_stsb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_nli_stsb BertSentenceEmbeddings from binwang +author: John Snow Labs +name: sent_bert_base_nli_stsb +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_nli_stsb` is a English model originally trained by binwang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_nli_stsb_en_5.5.0_3.0_1727202139914.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_nli_stsb_en_5.5.0_3.0_1727202139914.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_nli_stsb","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_nli_stsb","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_nli_stsb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/binwang/bert-base-nli-stsb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_2022_habana_test_3_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_2022_habana_test_3_en.md new file mode 100644 index 00000000000000..ab972f82eedca6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_2022_habana_test_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_2022_habana_test_3 BertSentenceEmbeddings from philschmid +author: John Snow Labs +name: sent_bert_base_uncased_2022_habana_test_3 +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_2022_habana_test_3` is a English model originally trained by philschmid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_2022_habana_test_3_en_5.5.0_3.0_1727201932435.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_2022_habana_test_3_en_5.5.0_3.0_1727201932435.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_2022_habana_test_3","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_2022_habana_test_3","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_2022_habana_test_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|412.0 MB| + +## References + +https://huggingface.co/philschmid/bert-base-uncased-2022-habana-test-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_2022_habana_test_3_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_2022_habana_test_3_pipeline_en.md new file mode 100644 index 00000000000000..8090ab5a2b51c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_base_uncased_2022_habana_test_3_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_2022_habana_test_3_pipeline pipeline BertSentenceEmbeddings from philschmid +author: John Snow Labs +name: sent_bert_base_uncased_2022_habana_test_3_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_2022_habana_test_3_pipeline` is a English model originally trained by philschmid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_2022_habana_test_3_pipeline_en_5.5.0_3.0_1727201954160.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_2022_habana_test_3_pipeline_en_5.5.0_3.0_1727201954160.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_2022_habana_test_3_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_2022_habana_test_3_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_2022_habana_test_3_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.5 MB| + +## References + +https://huggingface.co/philschmid/bert-base-uncased-2022-habana-test-3 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_large_cased_portuguese_lenerbr_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_large_cased_portuguese_lenerbr_pipeline_pt.md new file mode 100644 index 00000000000000..a9ee304c2155c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_large_cased_portuguese_lenerbr_pipeline_pt.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Portuguese sent_bert_large_cased_portuguese_lenerbr_pipeline pipeline BertSentenceEmbeddings from pierreguillou +author: John Snow Labs +name: sent_bert_large_cased_portuguese_lenerbr_pipeline +date: 2024-09-24 +tags: [pt, open_source, pipeline, onnx] +task: Embeddings +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_cased_portuguese_lenerbr_pipeline` is a Portuguese model originally trained by pierreguillou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_portuguese_lenerbr_pipeline_pt_5.5.0_3.0_1727202540280.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_portuguese_lenerbr_pipeline_pt_5.5.0_3.0_1727202540280.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_large_cased_portuguese_lenerbr_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_large_cased_portuguese_lenerbr_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_cased_portuguese_lenerbr_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|1.2 GB| + +## References + +https://huggingface.co/pierreguillou/bert-large-cased-pt-lenerbr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_large_cased_portuguese_lenerbr_pt.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_large_cased_portuguese_lenerbr_pt.md new file mode 100644 index 00000000000000..6061a322bd868d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_large_cased_portuguese_lenerbr_pt.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Portuguese sent_bert_large_cased_portuguese_lenerbr BertSentenceEmbeddings from pierreguillou +author: John Snow Labs +name: sent_bert_large_cased_portuguese_lenerbr +date: 2024-09-24 +tags: [pt, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_cased_portuguese_lenerbr` is a Portuguese model originally trained by pierreguillou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_portuguese_lenerbr_pt_5.5.0_3.0_1727202476652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_portuguese_lenerbr_pt_5.5.0_3.0_1727202476652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_cased_portuguese_lenerbr","pt") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_cased_portuguese_lenerbr","pt") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_cased_portuguese_lenerbr| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|pt| +|Size:|1.2 GB| + +## References + +https://huggingface.co/pierreguillou/bert-large-cased-pt-lenerbr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_medium_mlsm_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_medium_mlsm_en.md new file mode 100644 index 00000000000000..49d041ff2a3a00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_medium_mlsm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_medium_mlsm BertSentenceEmbeddings from SzegedAI +author: John Snow Labs +name: sent_bert_medium_mlsm +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_medium_mlsm` is a English model originally trained by SzegedAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_medium_mlsm_en_5.5.0_3.0_1727178506274.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_medium_mlsm_en_5.5.0_3.0_1727178506274.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_medium_mlsm","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_medium_mlsm","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_medium_mlsm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|157.1 MB| + +## References + +https://huggingface.co/SzegedAI/bert-medium-mlsm \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline_fa.md b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline_fa.md new file mode 100644 index 00000000000000..79de17e25f8508 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline_fa.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Persian sent_bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline pipeline BertSentenceEmbeddings from Yasamansaffari73 +author: John Snow Labs +name: sent_bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline +date: 2024-09-24 +tags: [fa, open_source, pipeline, onnx] +task: Embeddings +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline` is a Persian model originally trained by Yasamansaffari73. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline_fa_5.5.0_3.0_1727178564218.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline_fa_5.5.0_3.0_1727178564218.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline", lang = "fa") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline", lang = "fa") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fa| +|Size:|607.1 MB| + +## References + +https://huggingface.co/Yasamansaffari73/bert-fa-base-uncased-finetuned-ParsBert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_danish_bert_iolariu_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_danish_bert_iolariu_en.md new file mode 100644 index 00000000000000..0010ac240fd067 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_danish_bert_iolariu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_danish_bert_iolariu BertSentenceEmbeddings from iolariu +author: John Snow Labs +name: sent_danish_bert_iolariu +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_danish_bert_iolariu` is a English model originally trained by iolariu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_danish_bert_iolariu_en_5.5.0_3.0_1727157488463.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_danish_bert_iolariu_en_5.5.0_3.0_1727157488463.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_danish_bert_iolariu","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_danish_bert_iolariu","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_danish_bert_iolariu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/iolariu/DA_BERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_furina_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_furina_en.md new file mode 100644 index 00000000000000..ec723ca77d0aeb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_furina_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_furina XlmRoBertaSentenceEmbeddings from yihongLiu +author: John Snow Labs +name: sent_furina +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, xlm_roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_furina` is a English model originally trained by yihongLiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_furina_en_5.5.0_3.0_1727205845902.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_furina_en_5.5.0_3.0_1727205845902.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_furina","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_furina","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_furina| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/yihongLiu/furina \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_furina_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_furina_pipeline_en.md new file mode 100644 index 00000000000000..e909075711fe2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_furina_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_furina_pipeline pipeline XlmRoBertaSentenceEmbeddings from yihongLiu +author: John Snow Labs +name: sent_furina_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_furina_pipeline` is a English model originally trained by yihongLiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_furina_pipeline_en_5.5.0_3.0_1727205926922.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_furina_pipeline_en_5.5.0_3.0_1727205926922.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_furina_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_furina_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_furina_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.5 GB| + +## References + +https://huggingface.co/yihongLiu/furina + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- XlmRoBertaSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_less_300000_xlm_roberta_mmar_recipe_10_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_less_300000_xlm_roberta_mmar_recipe_10_en.md new file mode 100644 index 00000000000000..8d1b6386d8a74c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_less_300000_xlm_roberta_mmar_recipe_10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_less_300000_xlm_roberta_mmar_recipe_10 XlmRoBertaSentenceEmbeddings from CennetOguz +author: John Snow Labs +name: sent_less_300000_xlm_roberta_mmar_recipe_10 +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, xlm_roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_less_300000_xlm_roberta_mmar_recipe_10` is a English model originally trained by CennetOguz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_less_300000_xlm_roberta_mmar_recipe_10_en_5.5.0_3.0_1727205527478.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_less_300000_xlm_roberta_mmar_recipe_10_en_5.5.0_3.0_1727205527478.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_less_300000_xlm_roberta_mmar_recipe_10","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_less_300000_xlm_roberta_mmar_recipe_10","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_less_300000_xlm_roberta_mmar_recipe_10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/CennetOguz/less_300000_xlm_roberta_mmar_recipe_10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_less_300000_xlm_roberta_mmar_recipe_10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_less_300000_xlm_roberta_mmar_recipe_10_pipeline_en.md new file mode 100644 index 00000000000000..ef59e36c7042a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_less_300000_xlm_roberta_mmar_recipe_10_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_less_300000_xlm_roberta_mmar_recipe_10_pipeline pipeline XlmRoBertaSentenceEmbeddings from CennetOguz +author: John Snow Labs +name: sent_less_300000_xlm_roberta_mmar_recipe_10_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_less_300000_xlm_roberta_mmar_recipe_10_pipeline` is a English model originally trained by CennetOguz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_less_300000_xlm_roberta_mmar_recipe_10_pipeline_en_5.5.0_3.0_1727205583101.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_less_300000_xlm_roberta_mmar_recipe_10_pipeline_en_5.5.0_3.0_1727205583101.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_less_300000_xlm_roberta_mmar_recipe_10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_less_300000_xlm_roberta_mmar_recipe_10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_less_300000_xlm_roberta_mmar_recipe_10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/CennetOguz/less_300000_xlm_roberta_mmar_recipe_10 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- XlmRoBertaSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_s3_v1_20_epochs_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_s3_v1_20_epochs_en.md new file mode 100644 index 00000000000000..c699e07a46a6aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_s3_v1_20_epochs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_s3_v1_20_epochs BertSentenceEmbeddings from AethiQs-Max +author: John Snow Labs +name: sent_s3_v1_20_epochs +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_s3_v1_20_epochs` is a English model originally trained by AethiQs-Max. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_s3_v1_20_epochs_en_5.5.0_3.0_1727202409357.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_s3_v1_20_epochs_en_5.5.0_3.0_1727202409357.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_s3_v1_20_epochs","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_s3_v1_20_epochs","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_s3_v1_20_epochs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/AethiQs-Max/s3-v1-20_epochs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_s3_v1_20_epochs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_s3_v1_20_epochs_pipeline_en.md new file mode 100644 index 00000000000000..468ed0070d6a61 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_s3_v1_20_epochs_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_s3_v1_20_epochs_pipeline pipeline BertSentenceEmbeddings from AethiQs-Max +author: John Snow Labs +name: sent_s3_v1_20_epochs_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_s3_v1_20_epochs_pipeline` is a English model originally trained by AethiQs-Max. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_s3_v1_20_epochs_pipeline_en_5.5.0_3.0_1727202430246.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_s3_v1_20_epochs_pipeline_en_5.5.0_3.0_1727202430246.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_s3_v1_20_epochs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_s3_v1_20_epochs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_s3_v1_20_epochs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.4 MB| + +## References + +https://huggingface.co/AethiQs-Max/s3-v1-20_epochs + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_turkish_mini_bert_uncased_pipeline_tr.md b/docs/_posts/ahmedlone127/2024-09-24-sent_turkish_mini_bert_uncased_pipeline_tr.md new file mode 100644 index 00000000000000..c38c57cb803312 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_turkish_mini_bert_uncased_pipeline_tr.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Turkish sent_turkish_mini_bert_uncased_pipeline pipeline BertSentenceEmbeddings from ytu-ce-cosmos +author: John Snow Labs +name: sent_turkish_mini_bert_uncased_pipeline +date: 2024-09-24 +tags: [tr, open_source, pipeline, onnx] +task: Embeddings +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_turkish_mini_bert_uncased_pipeline` is a Turkish model originally trained by ytu-ce-cosmos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_turkish_mini_bert_uncased_pipeline_tr_5.5.0_3.0_1727202577142.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_turkish_mini_bert_uncased_pipeline_tr_5.5.0_3.0_1727202577142.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_turkish_mini_bert_uncased_pipeline", lang = "tr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_turkish_mini_bert_uncased_pipeline", lang = "tr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_turkish_mini_bert_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tr| +|Size:|43.8 MB| + +## References + +https://huggingface.co/ytu-ce-cosmos/turkish-mini-bert-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_en.md new file mode 100644 index 00000000000000..e5a14d4fa671fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned XlmRoBertaSentenceEmbeddings from RogerB +author: John Snow Labs +name: sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, xlm_roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_en_5.5.0_3.0_1727205637920.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_en_5.5.0_3.0_1727205637920.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/RogerB/xlm-roberta-base-finetuned-kinyarwanda-kin-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..9eeb31a0cc744f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_pipeline pipeline XlmRoBertaSentenceEmbeddings from RogerB +author: John Snow Labs +name: sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_pipeline` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_pipeline_en_5.5.0_3.0_1727205705111.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_pipeline_en_5.5.0_3.0_1727205705111.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_xlm_roberta_base_finetuned_kinyarwanda_kinyarwanda_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/RogerB/xlm-roberta-base-finetuned-kinyarwanda-kin-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- XlmRoBertaSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_v_base_trimmed_italian_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_v_base_trimmed_italian_en.md new file mode 100644 index 00000000000000..9a33209fdcf372 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_v_base_trimmed_italian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_xlm_v_base_trimmed_italian XlmRoBertaSentenceEmbeddings from vocabtrimmer +author: John Snow Labs +name: sent_xlm_v_base_trimmed_italian +date: 2024-09-24 +tags: [en, open_source, onnx, sentence_embeddings, xlm_roberta] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_xlm_v_base_trimmed_italian` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_xlm_v_base_trimmed_italian_en_5.5.0_3.0_1727205485563.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_xlm_v_base_trimmed_italian_en_5.5.0_3.0_1727205485563.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_xlm_v_base_trimmed_italian","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = XlmRoBertaSentenceEmbeddings.pretrained("sent_xlm_v_base_trimmed_italian","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_xlm_v_base_trimmed_italian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|526.3 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-v-base-trimmed-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_v_base_trimmed_italian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_v_base_trimmed_italian_pipeline_en.md new file mode 100644 index 00000000000000..0957f778649f25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_v_base_trimmed_italian_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_xlm_v_base_trimmed_italian_pipeline pipeline XlmRoBertaSentenceEmbeddings from vocabtrimmer +author: John Snow Labs +name: sent_xlm_v_base_trimmed_italian_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_xlm_v_base_trimmed_italian_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_xlm_v_base_trimmed_italian_pipeline_en_5.5.0_3.0_1727205639190.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_xlm_v_base_trimmed_italian_pipeline_en_5.5.0_3.0_1727205639190.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_xlm_v_base_trimmed_italian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_xlm_v_base_trimmed_italian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_xlm_v_base_trimmed_italian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|526.8 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-v-base-trimmed-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- XlmRoBertaSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_v_base_trimmed_portuguese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_v_base_trimmed_portuguese_pipeline_en.md new file mode 100644 index 00000000000000..a7b5329fe3c91d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sent_xlm_v_base_trimmed_portuguese_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_xlm_v_base_trimmed_portuguese_pipeline pipeline XlmRoBertaSentenceEmbeddings from vocabtrimmer +author: John Snow Labs +name: sent_xlm_v_base_trimmed_portuguese_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_xlm_v_base_trimmed_portuguese_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_xlm_v_base_trimmed_portuguese_pipeline_en_5.5.0_3.0_1727205808735.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_xlm_v_base_trimmed_portuguese_pipeline_en_5.5.0_3.0_1727205808735.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_xlm_v_base_trimmed_portuguese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_xlm_v_base_trimmed_portuguese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_xlm_v_base_trimmed_portuguese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|520.9 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-v-base-trimmed-pt + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- XlmRoBertaSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_model_mahmoud8_en.md b/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_model_mahmoud8_en.md new file mode 100644 index 00000000000000..6ef38a67352f35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_model_mahmoud8_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_analysis_model_mahmoud8 DistilBertForSequenceClassification from Mahmoud8 +author: John Snow Labs +name: sentiment_analysis_model_mahmoud8 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_model_mahmoud8` is a English model originally trained by Mahmoud8. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_mahmoud8_en_5.5.0_3.0_1727154821588.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_model_mahmoud8_en_5.5.0_3.0_1727154821588.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_model_mahmoud8","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("sentiment_analysis_model_mahmoud8", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_model_mahmoud8| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Mahmoud8/sentiment_analysis_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_with_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_with_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..353da1286909f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sentiment_analysis_with_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_analysis_with_distilbert_pipeline pipeline DistilBertForSequenceClassification from hdv2709 +author: John Snow Labs +name: sentiment_analysis_with_distilbert_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_analysis_with_distilbert_pipeline` is a English model originally trained by hdv2709. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_analysis_with_distilbert_pipeline_en_5.5.0_3.0_1727137059738.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_analysis_with_distilbert_pipeline_en_5.5.0_3.0_1727137059738.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_analysis_with_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_analysis_with_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_analysis_with_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/hdv2709/sentiment_analysis_with_DistilBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-services_ucacue_bryansagbay_en.md b/docs/_posts/ahmedlone127/2024-09-24-services_ucacue_bryansagbay_en.md new file mode 100644 index 00000000000000..916048d7fa54b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-services_ucacue_bryansagbay_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English services_ucacue_bryansagbay RoBertaForSequenceClassification from BryanSagbay +author: John Snow Labs +name: services_ucacue_bryansagbay +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`services_ucacue_bryansagbay` is a English model originally trained by BryanSagbay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/services_ucacue_bryansagbay_en_5.5.0_3.0_1727171716556.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/services_ucacue_bryansagbay_en_5.5.0_3.0_1727171716556.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("services_ucacue_bryansagbay","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("services_ucacue_bryansagbay", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|services_ucacue_bryansagbay| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|445.8 MB| + +## References + +https://huggingface.co/BryanSagbay/services-ucacue \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sgppellow_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sgppellow_pipeline_en.md new file mode 100644 index 00000000000000..15dee56749bd7b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sgppellow_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sgppellow_pipeline pipeline RoBertaForSequenceClassification from SGPPellow +author: John Snow Labs +name: sgppellow_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sgppellow_pipeline` is a English model originally trained by SGPPellow. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sgppellow_pipeline_en_5.5.0_3.0_1727171097734.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sgppellow_pipeline_en_5.5.0_3.0_1727171097734.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sgppellow_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sgppellow_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sgppellow_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|416.2 MB| + +## References + +https://huggingface.co/SGPPellow/SGPPellow + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-spillage_distilbert_base_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-24-spillage_distilbert_base_uncased_en.md new file mode 100644 index 00000000000000..1a8851d67586fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-spillage_distilbert_base_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English spillage_distilbert_base_uncased DistilBertForSequenceClassification from chuuhtetnaing +author: John Snow Labs +name: spillage_distilbert_base_uncased +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spillage_distilbert_base_uncased` is a English model originally trained by chuuhtetnaing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spillage_distilbert_base_uncased_en_5.5.0_3.0_1727164741410.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spillage_distilbert_base_uncased_en_5.5.0_3.0_1727164741410.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("spillage_distilbert_base_uncased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("spillage_distilbert_base_uncased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spillage_distilbert_base_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chuuhtetnaing/spillage-distilbert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-spillage_distilbert_base_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-spillage_distilbert_base_uncased_pipeline_en.md new file mode 100644 index 00000000000000..be83df8fa1fc97 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-spillage_distilbert_base_uncased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English spillage_distilbert_base_uncased_pipeline pipeline DistilBertForSequenceClassification from chuuhtetnaing +author: John Snow Labs +name: spillage_distilbert_base_uncased_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`spillage_distilbert_base_uncased_pipeline` is a English model originally trained by chuuhtetnaing. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/spillage_distilbert_base_uncased_pipeline_en_5.5.0_3.0_1727164756348.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/spillage_distilbert_base_uncased_pipeline_en_5.5.0_3.0_1727164756348.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("spillage_distilbert_base_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("spillage_distilbert_base_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|spillage_distilbert_base_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/chuuhtetnaing/spillage-distilbert-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-squeezebert_uncased_finetuned_squad_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-squeezebert_uncased_finetuned_squad_pipeline_en.md new file mode 100644 index 00000000000000..791ffde689e51c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-squeezebert_uncased_finetuned_squad_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English squeezebert_uncased_finetuned_squad_pipeline pipeline BertForQuestionAnswering from SupriyaArun +author: John Snow Labs +name: squeezebert_uncased_finetuned_squad_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`squeezebert_uncased_finetuned_squad_pipeline` is a English model originally trained by SupriyaArun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/squeezebert_uncased_finetuned_squad_pipeline_en_5.5.0_3.0_1727206792798.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/squeezebert_uncased_finetuned_squad_pipeline_en_5.5.0_3.0_1727206792798.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("squeezebert_uncased_finetuned_squad_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("squeezebert_uncased_finetuned_squad_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|squeezebert_uncased_finetuned_squad_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|187.4 MB| + +## References + +https://huggingface.co/SupriyaArun/squeezebert-uncased-finetuned-squad + +## Included Models + +- MultiDocumentAssembler +- BertForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sroberta_hr.md b/docs/_posts/ahmedlone127/2024-09-24-sroberta_hr.md new file mode 100644 index 00000000000000..b921b01841f2fe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sroberta_hr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Croatian sroberta RoBertaEmbeddings from Andrija +author: John Snow Labs +name: sroberta +date: 2024-09-24 +tags: [hr, open_source, onnx, embeddings, roberta] +task: Embeddings +language: hr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sroberta` is a Croatian model originally trained by Andrija. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sroberta_hr_5.5.0_3.0_1727216187649.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sroberta_hr_5.5.0_3.0_1727216187649.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = RoBertaEmbeddings.pretrained("sroberta","hr") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = RoBertaEmbeddings.pretrained("sroberta","hr") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sroberta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[roberta]| +|Language:|hr| +|Size:|450.7 MB| + +## References + +https://huggingface.co/Andrija/SRoBERTa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sroberta_pipeline_hr.md b/docs/_posts/ahmedlone127/2024-09-24-sroberta_pipeline_hr.md new file mode 100644 index 00000000000000..b118fcb6a362cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sroberta_pipeline_hr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Croatian sroberta_pipeline pipeline RoBertaEmbeddings from Andrija +author: John Snow Labs +name: sroberta_pipeline +date: 2024-09-24 +tags: [hr, open_source, pipeline, onnx] +task: Embeddings +language: hr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sroberta_pipeline` is a Croatian model originally trained by Andrija. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sroberta_pipeline_hr_5.5.0_3.0_1727216211408.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sroberta_pipeline_hr_5.5.0_3.0_1727216211408.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sroberta_pipeline", lang = "hr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sroberta_pipeline", lang = "hr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sroberta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hr| +|Size:|450.8 MB| + +## References + +https://huggingface.co/Andrija/SRoBERTa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_en.md b/docs/_posts/ahmedlone127/2024-09-24-stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_en.md new file mode 100644 index 00000000000000..86e2e0e2cc4099 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31 DistilBertForSequenceClassification from jvelja +author: John Snow Labs +name: stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31` is a English model originally trained by jvelja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_en_5.5.0_3.0_1727137393609.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31_en_5.5.0_3.0_1727137393609.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stego_classifier_checkpoint_epoch_0_2024_07_26_16_19_31| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/jvelja/stego-classifier-checkpoint-epoch-0-2024-07-26_16-19-31 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-subtopics_bigbird_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-subtopics_bigbird_base_pipeline_en.md new file mode 100644 index 00000000000000..26ed1d9a6d631b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-subtopics_bigbird_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English subtopics_bigbird_base_pipeline pipeline RoBertaForSequenceClassification from RogerKam +author: John Snow Labs +name: subtopics_bigbird_base_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`subtopics_bigbird_base_pipeline` is a English model originally trained by RogerKam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/subtopics_bigbird_base_pipeline_en_5.5.0_3.0_1727167913886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/subtopics_bigbird_base_pipeline_en_5.5.0_3.0_1727167913886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("subtopics_bigbird_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("subtopics_bigbird_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|subtopics_bigbird_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|436.6 MB| + +## References + +https://huggingface.co/RogerKam/subTopics-bigBird-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-sucidal_text_classification_distillbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-sucidal_text_classification_distillbert_pipeline_en.md new file mode 100644 index 00000000000000..ac60a2665dfb4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-sucidal_text_classification_distillbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sucidal_text_classification_distillbert_pipeline pipeline DistilBertForSequenceClassification from pradanaadn +author: John Snow Labs +name: sucidal_text_classification_distillbert_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sucidal_text_classification_distillbert_pipeline` is a English model originally trained by pradanaadn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sucidal_text_classification_distillbert_pipeline_en_5.5.0_3.0_1727136840942.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sucidal_text_classification_distillbert_pipeline_en_5.5.0_3.0_1727136840942.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sucidal_text_classification_distillbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sucidal_text_classification_distillbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sucidal_text_classification_distillbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/pradanaadn/sucidal-text-classification-distillbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-terjman_large_ar.md b/docs/_posts/ahmedlone127/2024-09-24-terjman_large_ar.md new file mode 100644 index 00000000000000..1a1057c65bf371 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-terjman_large_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic terjman_large MarianTransformer from atlasia +author: John Snow Labs +name: terjman_large +date: 2024-09-24 +tags: [ar, open_source, onnx, translation, marian] +task: Translation +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: MarianTransformer +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`terjman_large` is a Arabic model originally trained by atlasia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/terjman_large_ar_5.5.0_3.0_1727208921086.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/terjman_large_ar_5.5.0_3.0_1727208921086.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("translation") + +marian = MarianTransformer.pretrained("terjman_large","ar") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, marian]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val marian = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = MarianTransformer.pretrained("terjman_large","ar") + .setInputCols(Array("sentence")) + .setOutputCol("translation") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, marian)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|terjman_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentences]| +|Output Labels:|[translation]| +|Language:|ar| +|Size:|695.3 MB| + +## References + +https://huggingface.co/atlasia/Terjman-Large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-terjman_large_pipeline_ar.md b/docs/_posts/ahmedlone127/2024-09-24-terjman_large_pipeline_ar.md new file mode 100644 index 00000000000000..2afedf4f21e57b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-terjman_large_pipeline_ar.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Arabic terjman_large_pipeline pipeline MarianTransformer from atlasia +author: John Snow Labs +name: terjman_large_pipeline +date: 2024-09-24 +tags: [ar, open_source, pipeline, onnx] +task: Translation +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained MarianTransformer, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`terjman_large_pipeline` is a Arabic model originally trained by atlasia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/terjman_large_pipeline_ar_5.5.0_3.0_1727209153955.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/terjman_large_pipeline_ar_5.5.0_3.0_1727209153955.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("terjman_large_pipeline", lang = "ar") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("terjman_large_pipeline", lang = "ar") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|terjman_large_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ar| +|Size:|695.8 MB| + +## References + +https://huggingface.co/atlasia/Terjman-Large + +## Included Models + +- DocumentAssembler +- SentenceDetectorDLModel +- MarianTransformer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-test_nepal_bhasa_study_roberta_large_two_way_en.md b/docs/_posts/ahmedlone127/2024-09-24-test_nepal_bhasa_study_roberta_large_two_way_en.md new file mode 100644 index 00000000000000..7cd1b3f7fbdcc3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-test_nepal_bhasa_study_roberta_large_two_way_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_nepal_bhasa_study_roberta_large_two_way RoBertaForSequenceClassification from xiazeng +author: John Snow Labs +name: test_nepal_bhasa_study_roberta_large_two_way +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_nepal_bhasa_study_roberta_large_two_way` is a English model originally trained by xiazeng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_nepal_bhasa_study_roberta_large_two_way_en_5.5.0_3.0_1727172164564.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_nepal_bhasa_study_roberta_large_two_way_en_5.5.0_3.0_1727172164564.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("test_nepal_bhasa_study_roberta_large_two_way","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("test_nepal_bhasa_study_roberta_large_two_way", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_nepal_bhasa_study_roberta_large_two_way| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/xiazeng/test-new-study_roberta-large_two-way \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-test_nepal_bhasa_study_roberta_large_two_way_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-test_nepal_bhasa_study_roberta_large_two_way_pipeline_en.md new file mode 100644 index 00000000000000..d2bf81088a8a22 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-test_nepal_bhasa_study_roberta_large_two_way_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_nepal_bhasa_study_roberta_large_two_way_pipeline pipeline RoBertaForSequenceClassification from xiazeng +author: John Snow Labs +name: test_nepal_bhasa_study_roberta_large_two_way_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_nepal_bhasa_study_roberta_large_two_way_pipeline` is a English model originally trained by xiazeng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_nepal_bhasa_study_roberta_large_two_way_pipeline_en_5.5.0_3.0_1727172233741.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_nepal_bhasa_study_roberta_large_two_way_pipeline_en_5.5.0_3.0_1727172233741.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_nepal_bhasa_study_roberta_large_two_way_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_nepal_bhasa_study_roberta_large_two_way_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_nepal_bhasa_study_roberta_large_two_way_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/xiazeng/test-new-study_roberta-large_two-way + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tiny_random_debertafortokenclassification_en.md b/docs/_posts/ahmedlone127/2024-09-24-tiny_random_debertafortokenclassification_en.md new file mode 100644 index 00000000000000..94d68c7c71c3e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tiny_random_debertafortokenclassification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tiny_random_debertafortokenclassification BertForTokenClassification from hf-tiny-model-private +author: John Snow Labs +name: tiny_random_debertafortokenclassification +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_random_debertafortokenclassification` is a English model originally trained by hf-tiny-model-private. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_random_debertafortokenclassification_en_5.5.0_3.0_1727203363980.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_random_debertafortokenclassification_en_5.5.0_3.0_1727203363980.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("tiny_random_debertafortokenclassification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("tiny_random_debertafortokenclassification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_random_debertafortokenclassification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|346.1 KB| + +## References + +https://huggingface.co/hf-tiny-model-private/tiny-random-DebertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tiny_random_debertafortokenclassification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-tiny_random_debertafortokenclassification_pipeline_en.md new file mode 100644 index 00000000000000..18d50b81660b69 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tiny_random_debertafortokenclassification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tiny_random_debertafortokenclassification_pipeline pipeline BertForTokenClassification from hf-tiny-model-private +author: John Snow Labs +name: tiny_random_debertafortokenclassification_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tiny_random_debertafortokenclassification_pipeline` is a English model originally trained by hf-tiny-model-private. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tiny_random_debertafortokenclassification_pipeline_en_5.5.0_3.0_1727203364394.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tiny_random_debertafortokenclassification_pipeline_en_5.5.0_3.0_1727203364394.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tiny_random_debertafortokenclassification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tiny_random_debertafortokenclassification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tiny_random_debertafortokenclassification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|368.3 KB| + +## References + +https://huggingface.co/hf-tiny-model-private/tiny-random-DebertaForTokenClassification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tinybert_phishing_model_en.md b/docs/_posts/ahmedlone127/2024-09-24-tinybert_phishing_model_en.md new file mode 100644 index 00000000000000..e978b46978ccdf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tinybert_phishing_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tinybert_phishing_model BertForSequenceClassification from rpg1 +author: John Snow Labs +name: tinybert_phishing_model +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tinybert_phishing_model` is a English model originally trained by rpg1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tinybert_phishing_model_en_5.5.0_3.0_1727219397029.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tinybert_phishing_model_en_5.5.0_3.0_1727219397029.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("tinybert_phishing_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("tinybert_phishing_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tinybert_phishing_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|54.2 MB| + +## References + +https://huggingface.co/rpg1/tinyBERT_phishing_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tinybert_sentiment_amazon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-tinybert_sentiment_amazon_pipeline_en.md new file mode 100644 index 00000000000000..aae23b9155ebbf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tinybert_sentiment_amazon_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tinybert_sentiment_amazon_pipeline pipeline BertForSequenceClassification from AdamCodd +author: John Snow Labs +name: tinybert_sentiment_amazon_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tinybert_sentiment_amazon_pipeline` is a English model originally trained by AdamCodd. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tinybert_sentiment_amazon_pipeline_en_5.5.0_3.0_1727149427633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tinybert_sentiment_amazon_pipeline_en_5.5.0_3.0_1727149427633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tinybert_sentiment_amazon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tinybert_sentiment_amazon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tinybert_sentiment_amazon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|16.8 MB| + +## References + +https://huggingface.co/AdamCodd/tinybert-sentiment-amazon + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tinyroberta_squad2_en.md b/docs/_posts/ahmedlone127/2024-09-24-tinyroberta_squad2_en.md new file mode 100644 index 00000000000000..5c95a32f858964 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tinyroberta_squad2_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English tinyroberta_squad2 RoBertaForQuestionAnswering from JohnDoe70 +author: John Snow Labs +name: tinyroberta_squad2 +date: 2024-09-24 +tags: [en, open_source, onnx, question_answering, roberta] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tinyroberta_squad2` is a English model originally trained by JohnDoe70. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tinyroberta_squad2_en_5.5.0_3.0_1727210789171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tinyroberta_squad2_en_5.5.0_3.0_1727210789171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = RoBertaForQuestionAnswering.pretrained("tinyroberta_squad2","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = RoBertaForQuestionAnswering.pretrained("tinyroberta_squad2", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tinyroberta_squad2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|306.9 MB| + +## References + +https://huggingface.co/JohnDoe70/tinyroberta-squad2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tinyroberta_squad2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-tinyroberta_squad2_pipeline_en.md new file mode 100644 index 00000000000000..50cbb4cd701f9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tinyroberta_squad2_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English tinyroberta_squad2_pipeline pipeline RoBertaForQuestionAnswering from JohnDoe70 +author: John Snow Labs +name: tinyroberta_squad2_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForQuestionAnswering, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tinyroberta_squad2_pipeline` is a English model originally trained by JohnDoe70. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tinyroberta_squad2_pipeline_en_5.5.0_3.0_1727210805073.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tinyroberta_squad2_pipeline_en_5.5.0_3.0_1727210805073.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tinyroberta_squad2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tinyroberta_squad2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tinyroberta_squad2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|306.9 MB| + +## References + +https://huggingface.co/JohnDoe70/tinyroberta-squad2 + +## Included Models + +- MultiDocumentAssembler +- RoBertaForQuestionAnswering \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tmp_trainer_parth49_en.md b/docs/_posts/ahmedlone127/2024-09-24-tmp_trainer_parth49_en.md new file mode 100644 index 00000000000000..6234a48fd0345a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tmp_trainer_parth49_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tmp_trainer_parth49 DistilBertForSequenceClassification from Parth49 +author: John Snow Labs +name: tmp_trainer_parth49 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tmp_trainer_parth49` is a English model originally trained by Parth49. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tmp_trainer_parth49_en_5.5.0_3.0_1727154770677.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tmp_trainer_parth49_en_5.5.0_3.0_1727154770677.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("tmp_trainer_parth49","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("tmp_trainer_parth49", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tmp_trainer_parth49| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/Parth49/tmp_trainer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_en.md b/docs/_posts/ahmedlone127/2024-09-24-tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_en.md new file mode 100644 index 00000000000000..14b7a68d7b9f2c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized XlmRoBertaForTokenClassification from anonymoussubmissions +author: John Snow Labs +name: tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized` is a English model originally trained by anonymoussubmissions. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_en_5.5.0_3.0_1727214968307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_en_5.5.0_3.0_1727214968307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/anonymoussubmissions/tner-xlm-roberta-base-ontonotes5-earnings21-non-normalized \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_pipeline_en.md new file mode 100644 index 00000000000000..c40b8229bff916 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_pipeline pipeline XlmRoBertaForTokenClassification from anonymoussubmissions +author: John Snow Labs +name: tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_pipeline` is a English model originally trained by anonymoussubmissions. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_pipeline_en_5.5.0_3.0_1727215035397.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_pipeline_en_5.5.0_3.0_1727215035397.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tner_xlm_roberta_base_ontonotes5_earnings21_non_normalized_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|858.3 MB| + +## References + +https://huggingface.co/anonymoussubmissions/tner-xlm-roberta-base-ontonotes5-earnings21-non-normalized + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-transcript_classification_en.md b/docs/_posts/ahmedlone127/2024-09-24-transcript_classification_en.md new file mode 100644 index 00000000000000..6dfac0b9477952 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-transcript_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English transcript_classification DistilBertForSequenceClassification from aoshita +author: John Snow Labs +name: transcript_classification +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, distilbert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: DistilBertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`transcript_classification` is a English model originally trained by aoshita. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/transcript_classification_en_5.5.0_3.0_1727154549754.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/transcript_classification_en_5.5.0_3.0_1727154549754.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("transcript_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("transcript_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|transcript_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|249.5 MB| + +## References + +https://huggingface.co/aoshita/transcript_classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-trial_model_qstrats_en.md b/docs/_posts/ahmedlone127/2024-09-24-trial_model_qstrats_en.md new file mode 100644 index 00000000000000..6b694f31b622a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-trial_model_qstrats_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English trial_model_qstrats RoBertaForSequenceClassification from qstrats +author: John Snow Labs +name: trial_model_qstrats +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trial_model_qstrats` is a English model originally trained by qstrats. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trial_model_qstrats_en_5.5.0_3.0_1727167479675.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trial_model_qstrats_en_5.5.0_3.0_1727167479675.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("trial_model_qstrats","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("trial_model_qstrats", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trial_model_qstrats| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|416.3 MB| + +## References + +https://huggingface.co/qstrats/trial-model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tuned_test_trainer_bert_base_uncased_mrredborne_en.md b/docs/_posts/ahmedlone127/2024-09-24-tuned_test_trainer_bert_base_uncased_mrredborne_en.md new file mode 100644 index 00000000000000..13e6687a7088f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tuned_test_trainer_bert_base_uncased_mrredborne_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tuned_test_trainer_bert_base_uncased_mrredborne BertForSequenceClassification from Mrredborne +author: John Snow Labs +name: tuned_test_trainer_bert_base_uncased_mrredborne +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tuned_test_trainer_bert_base_uncased_mrredborne` is a English model originally trained by Mrredborne. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tuned_test_trainer_bert_base_uncased_mrredborne_en_5.5.0_3.0_1727213463146.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tuned_test_trainer_bert_base_uncased_mrredborne_en_5.5.0_3.0_1727213463146.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("tuned_test_trainer_bert_base_uncased_mrredborne","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("tuned_test_trainer_bert_base_uncased_mrredborne", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tuned_test_trainer_bert_base_uncased_mrredborne| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Mrredborne/tuned_test_trainer-bert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tuned_test_trainer_bert_base_uncased_mrredborne_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-tuned_test_trainer_bert_base_uncased_mrredborne_pipeline_en.md new file mode 100644 index 00000000000000..aa86e25f957c6e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tuned_test_trainer_bert_base_uncased_mrredborne_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tuned_test_trainer_bert_base_uncased_mrredborne_pipeline pipeline BertForSequenceClassification from Mrredborne +author: John Snow Labs +name: tuned_test_trainer_bert_base_uncased_mrredborne_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tuned_test_trainer_bert_base_uncased_mrredborne_pipeline` is a English model originally trained by Mrredborne. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tuned_test_trainer_bert_base_uncased_mrredborne_pipeline_en_5.5.0_3.0_1727213484226.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tuned_test_trainer_bert_base_uncased_mrredborne_pipeline_en_5.5.0_3.0_1727213484226.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tuned_test_trainer_bert_base_uncased_mrredborne_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tuned_test_trainer_bert_base_uncased_mrredborne_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tuned_test_trainer_bert_base_uncased_mrredborne_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Mrredborne/tuned_test_trainer-bert-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-tuning_lr_0_1_wd_0_01_epochs_1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-tuning_lr_0_1_wd_0_01_epochs_1_pipeline_en.md new file mode 100644 index 00000000000000..fd9729b968b78d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-tuning_lr_0_1_wd_0_01_epochs_1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tuning_lr_0_1_wd_0_01_epochs_1_pipeline pipeline DistilBertForSequenceClassification from ash-akjp-ga +author: John Snow Labs +name: tuning_lr_0_1_wd_0_01_epochs_1_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tuning_lr_0_1_wd_0_01_epochs_1_pipeline` is a English model originally trained by ash-akjp-ga. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tuning_lr_0_1_wd_0_01_epochs_1_pipeline_en_5.5.0_3.0_1727164669526.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tuning_lr_0_1_wd_0_01_epochs_1_pipeline_en_5.5.0_3.0_1727164669526.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tuning_lr_0_1_wd_0_01_epochs_1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tuning_lr_0_1_wd_0_01_epochs_1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tuning_lr_0_1_wd_0_01_epochs_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|251.1 MB| + +## References + +https://huggingface.co/ash-akjp-ga/tuning_lr_0.1_wd_0.01_epochs_1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- DistilBertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-twitter_roberta_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-twitter_roberta_base_pipeline_en.md new file mode 100644 index 00000000000000..02c9ec233bfda9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-twitter_roberta_base_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitter_roberta_base_pipeline pipeline RoBertaEmbeddings from cardiffnlp +author: John Snow Labs +name: twitter_roberta_base_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_roberta_base_pipeline` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_pipeline_en_5.5.0_3.0_1727216074828.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_roberta_base_pipeline_en_5.5.0_3.0_1727216074828.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitter_roberta_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitter_roberta_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_roberta_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.9 MB| + +## References + +https://huggingface.co/cardiffnlp/twitter-roberta-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-wav2vec2_base_igbo_en.md b/docs/_posts/ahmedlone127/2024-09-24-wav2vec2_base_igbo_en.md new file mode 100644 index 00000000000000..bca3fda5eb2bcd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-wav2vec2_base_igbo_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English wav2vec2_base_igbo WhisperForCTC from Msughterx +author: John Snow Labs +name: wav2vec2_base_igbo +date: 2024-09-24 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wav2vec2_base_igbo` is a English model originally trained by Msughterx. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wav2vec2_base_igbo_en_5.5.0_3.0_1727145263148.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wav2vec2_base_igbo_en_5.5.0_3.0_1727145263148.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("wav2vec2_base_igbo","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("wav2vec2_base_igbo", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wav2vec2_base_igbo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/Msughterx/wav2vec2-base-igbo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_medium_with_google_fleurs_arabic_4000_steps_en.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_medium_with_google_fleurs_arabic_4000_steps_en.md new file mode 100644 index 00000000000000..9dd3f2964dfefb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_medium_with_google_fleurs_arabic_4000_steps_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_medium_with_google_fleurs_arabic_4000_steps WhisperForCTC from MohammadJamalaldeen +author: John Snow Labs +name: whisper_medium_with_google_fleurs_arabic_4000_steps +date: 2024-09-24 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_medium_with_google_fleurs_arabic_4000_steps` is a English model originally trained by MohammadJamalaldeen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_medium_with_google_fleurs_arabic_4000_steps_en_5.5.0_3.0_1727144470487.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_medium_with_google_fleurs_arabic_4000_steps_en_5.5.0_3.0_1727144470487.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_medium_with_google_fleurs_arabic_4000_steps","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_medium_with_google_fleurs_arabic_4000_steps", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_medium_with_google_fleurs_arabic_4000_steps| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/MohammadJamalaldeen/whisper-medium-with-google-fleurs-ar-4000_steps \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-whisper_small_portuguese_pedropauletti_pt.md b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_portuguese_pedropauletti_pt.md new file mode 100644 index 00000000000000..59652cf79487f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-whisper_small_portuguese_pedropauletti_pt.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Portuguese whisper_small_portuguese_pedropauletti WhisperForCTC from pedropauletti +author: John Snow Labs +name: whisper_small_portuguese_pedropauletti +date: 2024-09-24 +tags: [pt, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_portuguese_pedropauletti` is a Portuguese model originally trained by pedropauletti. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_pedropauletti_pt_5.5.0_3.0_1727194190826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_pedropauletti_pt_5.5.0_3.0_1727194190826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_portuguese_pedropauletti","pt") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_portuguese_pedropauletti", "pt") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_portuguese_pedropauletti| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|pt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/pedropauletti/whisper-small-pt \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-wikibert_base_parsinlu_entailment_fa.md b/docs/_posts/ahmedlone127/2024-09-24-wikibert_base_parsinlu_entailment_fa.md new file mode 100644 index 00000000000000..dd508a63958234 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-wikibert_base_parsinlu_entailment_fa.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Persian wikibert_base_parsinlu_entailment BertForSequenceClassification from persiannlp +author: John Snow Labs +name: wikibert_base_parsinlu_entailment +date: 2024-09-24 +tags: [fa, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wikibert_base_parsinlu_entailment` is a Persian model originally trained by persiannlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wikibert_base_parsinlu_entailment_fa_5.5.0_3.0_1727219331306.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wikibert_base_parsinlu_entailment_fa_5.5.0_3.0_1727219331306.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("wikibert_base_parsinlu_entailment","fa") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("wikibert_base_parsinlu_entailment", "fa") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wikibert_base_parsinlu_entailment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|fa| +|Size:|380.3 MB| + +## References + +https://huggingface.co/persiannlp/wikibert-base-parsinlu-entailment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-wikibert_base_parsinlu_entailment_pipeline_fa.md b/docs/_posts/ahmedlone127/2024-09-24-wikibert_base_parsinlu_entailment_pipeline_fa.md new file mode 100644 index 00000000000000..70cc1a1cd2df88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-wikibert_base_parsinlu_entailment_pipeline_fa.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Persian wikibert_base_parsinlu_entailment_pipeline pipeline BertForSequenceClassification from persiannlp +author: John Snow Labs +name: wikibert_base_parsinlu_entailment_pipeline +date: 2024-09-24 +tags: [fa, open_source, pipeline, onnx] +task: Text Classification +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wikibert_base_parsinlu_entailment_pipeline` is a Persian model originally trained by persiannlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wikibert_base_parsinlu_entailment_pipeline_fa_5.5.0_3.0_1727219350247.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wikibert_base_parsinlu_entailment_pipeline_fa_5.5.0_3.0_1727219350247.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("wikibert_base_parsinlu_entailment_pipeline", lang = "fa") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("wikibert_base_parsinlu_entailment_pipeline", lang = "fa") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wikibert_base_parsinlu_entailment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fa| +|Size:|380.3 MB| + +## References + +https://huggingface.co/persiannlp/wikibert-base-parsinlu-entailment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-wineberto_ner_en.md b/docs/_posts/ahmedlone127/2024-09-24-wineberto_ner_en.md new file mode 100644 index 00000000000000..4a1a00c6f4d05a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-wineberto_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English wineberto_ner BertForTokenClassification from panigrah +author: John Snow Labs +name: wineberto_ner +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wineberto_ner` is a English model originally trained by panigrah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wineberto_ner_en_5.5.0_3.0_1727203626360.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wineberto_ner_en_5.5.0_3.0_1727203626360.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("wineberto_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("wineberto_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wineberto_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/panigrah/wineberto-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-wineberto_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-wineberto_ner_pipeline_en.md new file mode 100644 index 00000000000000..6efe3745e5d6ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-wineberto_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English wineberto_ner_pipeline pipeline BertForTokenClassification from panigrah +author: John Snow Labs +name: wineberto_ner_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wineberto_ner_pipeline` is a English model originally trained by panigrah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wineberto_ner_pipeline_en_5.5.0_3.0_1727203648432.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wineberto_ner_pipeline_en_5.5.0_3.0_1727203648432.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("wineberto_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("wineberto_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wineberto_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/panigrah/wineberto-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_pipeline_en.md new file mode 100644 index 00000000000000..b27d4eb7c76015 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_pipeline pipeline XlmRoBertaForSequenceClassification from vg055 +author: John Snow Labs +name: xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_pipeline` is a English model originally trained by vg055. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_pipeline_en_5.5.0_3.0_1727152814072.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_pipeline_en_5.5.0_3.0_1727152814072.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_iberautextification2024_7030_4epo_task1_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|894.7 MB| + +## References + +https://huggingface.co/vg055/xlm-roberta-base-finetuned-IberAuTexTification2024-7030-4epo-task1-v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_marc_tielupeng_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_marc_tielupeng_pipeline_en.md new file mode 100644 index 00000000000000..8d1513dd841347 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_marc_tielupeng_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_marc_tielupeng_pipeline pipeline XlmRoBertaForSequenceClassification from tielupeng +author: John Snow Labs +name: xlm_roberta_base_finetuned_marc_tielupeng_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_marc_tielupeng_pipeline` is a English model originally trained by tielupeng. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_tielupeng_pipeline_en_5.5.0_3.0_1727156675655.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_tielupeng_pipeline_en_5.5.0_3.0_1727156675655.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_marc_tielupeng_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_marc_tielupeng_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_marc_tielupeng_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|835.1 MB| + +## References + +https://huggingface.co/tielupeng/xlm-roberta-base-finetuned-marc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_english_khadija267_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_english_khadija267_en.md new file mode 100644 index 00000000000000..a8f2af5d235f51 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_english_khadija267_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_english_khadija267 XlmRoBertaForTokenClassification from khadija267 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_english_khadija267 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_english_khadija267` is a English model originally trained by khadija267. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_khadija267_en_5.5.0_3.0_1727160892552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_english_khadija267_en_5.5.0_3.0_1727160892552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_khadija267","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_english_khadija267", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_english_khadija267| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|826.4 MB| + +## References + +https://huggingface.co/khadija267/xlm-roberta-base-finetuned-panx-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_cyrildever_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_cyrildever_en.md new file mode 100644 index 00000000000000..878def22d053cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_cyrildever_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_cyrildever XlmRoBertaForTokenClassification from cyrildever +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_cyrildever +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_cyrildever` is a English model originally trained by cyrildever. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cyrildever_en_5.5.0_3.0_1727148148262.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_cyrildever_en_5.5.0_3.0_1727148148262.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_cyrildever","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_cyrildever", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_cyrildever| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/cyrildever/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_ridealist_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_ridealist_pipeline_en.md new file mode 100644 index 00000000000000..8098d39588e8d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_ridealist_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_ridealist_pipeline pipeline XlmRoBertaForTokenClassification from Ridealist +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_ridealist_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_ridealist_pipeline` is a English model originally trained by Ridealist. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ridealist_pipeline_en_5.5.0_3.0_1727147792847.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_ridealist_pipeline_en_5.5.0_3.0_1727147792847.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_ridealist_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_french_ridealist_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_ridealist_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|827.9 MB| + +## References + +https://huggingface.co/Ridealist/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_zebans_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_zebans_en.md new file mode 100644 index 00000000000000..827d75ce5e06e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_french_zebans_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_french_zebans XlmRoBertaForTokenClassification from zebans +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_french_zebans +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_french_zebans` is a English model originally trained by zebans. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_zebans_en_5.5.0_3.0_1727160536340.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_french_zebans_en_5.5.0_3.0_1727160536340.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_zebans","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_french_zebans", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_french_zebans| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|831.2 MB| + +## References + +https://huggingface.co/zebans/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_dasooo_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_dasooo_en.md new file mode 100644 index 00000000000000..19602d988689e0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_dasooo_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_dasooo XlmRoBertaForTokenClassification from daSooo +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_dasooo +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_dasooo` is a English model originally trained by daSooo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_dasooo_en_5.5.0_3.0_1727180293698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_dasooo_en_5.5.0_3.0_1727180293698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_dasooo","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_dasooo", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_dasooo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|843.4 MB| + +## References + +https://huggingface.co/daSooo/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_isaacp_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_isaacp_en.md new file mode 100644 index 00000000000000..78eab5801e6db5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_isaacp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_isaacp XlmRoBertaForTokenClassification from Isaacp +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_isaacp +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_isaacp` is a English model originally trained by Isaacp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_isaacp_en_5.5.0_3.0_1727147434563.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_isaacp_en_5.5.0_3.0_1727147434563.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_isaacp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_isaacp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_isaacp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/Isaacp/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_ysige_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_ysige_en.md new file mode 100644 index 00000000000000..bb99e12f495196 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_french_ysige_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_french_ysige XlmRoBertaForTokenClassification from ysige +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_french_ysige +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_french_ysige` is a English model originally trained by ysige. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_ysige_en_5.5.0_3.0_1727214815789.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_french_ysige_en_5.5.0_3.0_1727214815789.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_ysige","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_french_ysige", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_french_ysige| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|858.2 MB| + +## References + +https://huggingface.co/ysige/xlm-roberta-base-finetuned-panx-de-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_hirosay_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_hirosay_en.md new file mode 100644 index 00000000000000..e2c9bac49ae777 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_hirosay_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hirosay XlmRoBertaForTokenClassification from hirosay +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hirosay +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hirosay` is a English model originally trained by hirosay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hirosay_en_5.5.0_3.0_1727214830655.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hirosay_en_5.5.0_3.0_1727214830655.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hirosay","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_hirosay", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hirosay| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/hirosay/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_hirosay_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_hirosay_pipeline_en.md new file mode 100644 index 00000000000000..644244ca5c4283 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_hirosay_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_hirosay_pipeline pipeline XlmRoBertaForTokenClassification from hirosay +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_hirosay_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_hirosay_pipeline` is a English model originally trained by hirosay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hirosay_pipeline_en_5.5.0_3.0_1727214903264.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_hirosay_pipeline_en_5.5.0_3.0_1727214903264.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hirosay_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_hirosay_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_hirosay_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/hirosay/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_laurentiustancioiu_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_laurentiustancioiu_en.md new file mode 100644 index 00000000000000..ba1a8eb9efa145 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_laurentiustancioiu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_laurentiustancioiu XlmRoBertaForTokenClassification from LaurentiuStancioiu +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_laurentiustancioiu +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_laurentiustancioiu` is a English model originally trained by LaurentiuStancioiu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_laurentiustancioiu_en_5.5.0_3.0_1727214813344.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_laurentiustancioiu_en_5.5.0_3.0_1727214813344.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_laurentiustancioiu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_laurentiustancioiu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_laurentiustancioiu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|853.8 MB| + +## References + +https://huggingface.co/LaurentiuStancioiu/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_param_mehta_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_param_mehta_en.md new file mode 100644 index 00000000000000..f91f198320e9b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_param_mehta_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_param_mehta XlmRoBertaForTokenClassification from param-mehta +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_param_mehta +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_param_mehta` is a English model originally trained by param-mehta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_param_mehta_en_5.5.0_3.0_1727215226799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_param_mehta_en_5.5.0_3.0_1727215226799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_param_mehta","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_param_mehta", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_param_mehta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|841.1 MB| + +## References + +https://huggingface.co/param-mehta/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_param_mehta_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_param_mehta_pipeline_en.md new file mode 100644 index 00000000000000..961480b8010717 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_param_mehta_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_param_mehta_pipeline pipeline XlmRoBertaForTokenClassification from param-mehta +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_param_mehta_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_param_mehta_pipeline` is a English model originally trained by param-mehta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_param_mehta_pipeline_en_5.5.0_3.0_1727215312498.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_param_mehta_pipeline_en_5.5.0_3.0_1727215312498.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_param_mehta_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_german_param_mehta_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_param_mehta_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|841.2 MB| + +## References + +https://huggingface.co/param-mehta/xlm-roberta-base-finetuned-panx-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_youngbreadho_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_youngbreadho_en.md new file mode 100644 index 00000000000000..dbff1a6fc9ff6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_german_youngbreadho_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_german_youngbreadho XlmRoBertaForTokenClassification from youngbreadho +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_german_youngbreadho +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_german_youngbreadho` is a English model originally trained by youngbreadho. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_youngbreadho_en_5.5.0_3.0_1727215193256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_german_youngbreadho_en_5.5.0_3.0_1727215193256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_youngbreadho","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_german_youngbreadho", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_german_youngbreadho| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|857.8 MB| + +## References + +https://huggingface.co/youngbreadho/xlm-roberta-base-finetuned-panx-de \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_hindi_urdu_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_hindi_urdu_pipeline_en.md new file mode 100644 index 00000000000000..dbb7386fcfb873 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_hindi_urdu_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_hindi_urdu_pipeline pipeline XlmRoBertaForTokenClassification from DeepaPeri +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_hindi_urdu_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_hindi_urdu_pipeline` is a English model originally trained by DeepaPeri. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_urdu_pipeline_en_5.5.0_3.0_1727147963171.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_hindi_urdu_pipeline_en_5.5.0_3.0_1727147963171.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_panx_hindi_urdu_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_panx_hindi_urdu_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_hindi_urdu_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|828.1 MB| + +## References + +https://huggingface.co/DeepaPeri/xlm-roberta-base-finetuned-panx-hi-ur + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_khadija267_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_khadija267_en.md new file mode 100644 index 00000000000000..f2692a521846ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_finetuned_panx_italian_khadija267_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_panx_italian_khadija267 XlmRoBertaForTokenClassification from khadija267 +author: John Snow Labs +name: xlm_roberta_base_finetuned_panx_italian_khadija267 +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, xlm_roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_panx_italian_khadija267` is a English model originally trained by khadija267. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_khadija267_en_5.5.0_3.0_1727174887087.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_panx_italian_khadija267_en_5.5.0_3.0_1727174887087.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_khadija267","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = XlmRoBertaForTokenClassification.pretrained("xlm_roberta_base_finetuned_panx_italian_khadija267", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_panx_italian_khadija267| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|828.6 MB| + +## References + +https://huggingface.co/khadija267/xlm-roberta-base-finetuned-panx-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_mixed_aug_swap_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_mixed_aug_swap_pipeline_en.md new file mode 100644 index 00000000000000..33071f17ec7a32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_mixed_aug_swap_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_mixed_aug_swap_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_mixed_aug_swap_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_mixed_aug_swap_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_mixed_aug_swap_pipeline_en_5.5.0_3.0_1727153213563.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_mixed_aug_swap_pipeline_en_5.5.0_3.0_1727153213563.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_mixed_aug_swap_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_mixed_aug_swap_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_mixed_aug_swap_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|796.2 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Mixed-aug_swap + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_pharmaconer_kanansharmaa_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_pharmaconer_kanansharmaa_en.md new file mode 100644 index 00000000000000..3134784b614898 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_pharmaconer_kanansharmaa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_pharmaconer_kanansharmaa RoBertaForTokenClassification from kanansharmaa +author: John Snow Labs +name: xlm_roberta_base_pharmaconer_kanansharmaa +date: 2024-09-24 +tags: [en, open_source, onnx, token_classification, roberta, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_pharmaconer_kanansharmaa` is a English model originally trained by kanansharmaa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pharmaconer_kanansharmaa_en_5.5.0_3.0_1727139613353.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_pharmaconer_kanansharmaa_en_5.5.0_3.0_1727139613353.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = RoBertaForTokenClassification.pretrained("xlm_roberta_base_pharmaconer_kanansharmaa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = RoBertaForTokenClassification.pretrained("xlm_roberta_base_pharmaconer_kanansharmaa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_pharmaconer_kanansharmaa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|829.0 MB| + +## References + +https://huggingface.co/kanansharmaa/xlm-roberta-base-pharmaconer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_russian_sentiment_liniscrowd_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_russian_sentiment_liniscrowd_en.md new file mode 100644 index 00000000000000..1b7378ced619f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_russian_sentiment_liniscrowd_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_russian_sentiment_liniscrowd XlmRoBertaForSequenceClassification from sismetanin +author: John Snow Labs +name: xlm_roberta_base_russian_sentiment_liniscrowd +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_russian_sentiment_liniscrowd` is a English model originally trained by sismetanin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_russian_sentiment_liniscrowd_en_5.5.0_3.0_1727152693144.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_russian_sentiment_liniscrowd_en_5.5.0_3.0_1727152693144.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_russian_sentiment_liniscrowd","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_russian_sentiment_liniscrowd", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_russian_sentiment_liniscrowd| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|800.0 MB| + +## References + +https://huggingface.co/sismetanin/xlm_roberta_base-ru-sentiment-liniscrowd \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_vietnam_aug_swap_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_vietnam_aug_swap_pipeline_en.md new file mode 100644 index 00000000000000..b111e1f8fac0dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_roberta_base_vietnam_aug_swap_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_vietnam_aug_swap_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_vietnam_aug_swap_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_vietnam_aug_swap_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vietnam_aug_swap_pipeline_en_5.5.0_3.0_1727152761189.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_vietnam_aug_swap_pipeline_en_5.5.0_3.0_1727152761189.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_vietnam_aug_swap_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_vietnam_aug_swap_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_vietnam_aug_swap_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|794.3 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-VietNam-aug_swap + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlm_v_base_trimmed_german_tweet_sentiment_german_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlm_v_base_trimmed_german_tweet_sentiment_german_pipeline_en.md new file mode 100644 index 00000000000000..ee38ed1defc192 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlm_v_base_trimmed_german_tweet_sentiment_german_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_v_base_trimmed_german_tweet_sentiment_german_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_v_base_trimmed_german_tweet_sentiment_german_pipeline +date: 2024-09-24 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_v_base_trimmed_german_tweet_sentiment_german_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_v_base_trimmed_german_tweet_sentiment_german_pipeline_en_5.5.0_3.0_1727152675534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_v_base_trimmed_german_tweet_sentiment_german_pipeline_en_5.5.0_3.0_1727152675534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_v_base_trimmed_german_tweet_sentiment_german_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_v_base_trimmed_german_tweet_sentiment_german_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_v_base_trimmed_german_tweet_sentiment_german_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|750.9 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-v-base-trimmed-de-tweet-sentiment-de + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlmr_estonian_english_all_shuffled_1986_test1000_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlmr_estonian_english_all_shuffled_1986_test1000_en.md new file mode 100644 index 00000000000000..c86e97db5e5d0a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlmr_estonian_english_all_shuffled_1986_test1000_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_estonian_english_all_shuffled_1986_test1000 XlmRoBertaForSequenceClassification from patpizio +author: John Snow Labs +name: xlmr_estonian_english_all_shuffled_1986_test1000 +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_estonian_english_all_shuffled_1986_test1000` is a English model originally trained by patpizio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_estonian_english_all_shuffled_1986_test1000_en_5.5.0_3.0_1727155839562.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_estonian_english_all_shuffled_1986_test1000_en_5.5.0_3.0_1727155839562.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_estonian_english_all_shuffled_1986_test1000","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_estonian_english_all_shuffled_1986_test1000", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_estonian_english_all_shuffled_1986_test1000| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|820.5 MB| + +## References + +https://huggingface.co/patpizio/xlmr-et-en-all_shuffled-1986-test1000 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlmr_semantic_textual_relatedness_en.md b/docs/_posts/ahmedlone127/2024-09-24-xlmr_semantic_textual_relatedness_en.md new file mode 100644 index 00000000000000..430ae9c0560951 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlmr_semantic_textual_relatedness_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlmr_semantic_textual_relatedness XlmRoBertaForSequenceClassification from kietnt0603 +author: John Snow Labs +name: xlmr_semantic_textual_relatedness +date: 2024-09-24 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmr_semantic_textual_relatedness` is a English model originally trained by kietnt0603. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmr_semantic_textual_relatedness_en_5.5.0_3.0_1727156208292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmr_semantic_textual_relatedness_en_5.5.0_3.0_1727156208292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_semantic_textual_relatedness","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlmr_semantic_textual_relatedness", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmr_semantic_textual_relatedness| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/kietnt0603/xlmr-semantic-textual-relatedness \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlmroberta_ner_moghis_base_finetuned_panx_fr.md b/docs/_posts/ahmedlone127/2024-09-24-xlmroberta_ner_moghis_base_finetuned_panx_fr.md new file mode 100644 index 00000000000000..dc1764a995290c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlmroberta_ner_moghis_base_finetuned_panx_fr.md @@ -0,0 +1,112 @@ +--- +layout: model +title: French XLMRobertaForTokenClassification Base Cased model (from moghis) +author: John Snow Labs +name: xlmroberta_ner_moghis_base_finetuned_panx +date: 2024-09-24 +tags: [fr, open_source, xlm_roberta, ner, onnx] +task: Named Entity Recognition +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XLMRobertaForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-finetuned-panx-fr` is a French model originally trained by `moghis`. + +## Predicted Entities + +`PER`, `LOC`, `ORG` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_moghis_base_finetuned_panx_fr_5.5.0_3.0_1727215007838.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_moghis_base_finetuned_panx_fr_5.5.0_3.0_1727215007838.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_moghis_base_finetuned_panx","fr") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("ner") + +ner_converter = NerConverter()\ + .setInputCols(["document", "token", "ner"])\ + .setOutputCol("ner_chunk") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, token_classifier, ner_converter]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val token_classifier = XlmRoBertaForTokenClassification.pretrained("xlmroberta_ner_moghis_base_finetuned_panx","fr") + .setInputCols(Array("document", "token")) + .setOutputCol("ner") + +val ner_converter = new NerConverter() + .setInputCols(Array("document", "token', "ner")) + .setOutputCol("ner_chunk") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, token_classifier, ner_converter)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("fr.ner.xlmr_roberta.xtreme.base_finetuned.by_moghis").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_moghis_base_finetuned_panx| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|fr| +|Size:|840.9 MB| + +## References + +References + +- https://huggingface.co/moghis/xlm-roberta-base-finetuned-panx-fr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-24-xlmroberta_ner_moghis_base_finetuned_panx_pipeline_fr.md b/docs/_posts/ahmedlone127/2024-09-24-xlmroberta_ner_moghis_base_finetuned_panx_pipeline_fr.md new file mode 100644 index 00000000000000..e1713b7fd9f52e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-24-xlmroberta_ner_moghis_base_finetuned_panx_pipeline_fr.md @@ -0,0 +1,70 @@ +--- +layout: model +title: French xlmroberta_ner_moghis_base_finetuned_panx_pipeline pipeline XlmRoBertaForTokenClassification from moghis +author: John Snow Labs +name: xlmroberta_ner_moghis_base_finetuned_panx_pipeline +date: 2024-09-24 +tags: [fr, open_source, pipeline, onnx] +task: Named Entity Recognition +language: fr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_ner_moghis_base_finetuned_panx_pipeline` is a French model originally trained by moghis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_moghis_base_finetuned_panx_pipeline_fr_5.5.0_3.0_1727215088565.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_ner_moghis_base_finetuned_panx_pipeline_fr_5.5.0_3.0_1727215088565.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_ner_moghis_base_finetuned_panx_pipeline", lang = "fr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_ner_moghis_base_finetuned_panx_pipeline", lang = "fr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_ner_moghis_base_finetuned_panx_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fr| +|Size:|840.9 MB| + +## References + +https://huggingface.co/moghis/xlm-roberta-base-finetuned-panx-fr + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-2d_oomv2_800_en.md b/docs/_posts/ahmedlone127/2024-09-25-2d_oomv2_800_en.md new file mode 100644 index 00000000000000..168ee37ad7b9fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-2d_oomv2_800_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2d_oomv2_800 BertForSequenceClassification from abbassix +author: John Snow Labs +name: 2d_oomv2_800 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2d_oomv2_800` is a English model originally trained by abbassix. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2d_oomv2_800_en_5.5.0_3.0_1727288371864.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2d_oomv2_800_en_5.5.0_3.0_1727288371864.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("2d_oomv2_800","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("2d_oomv2_800", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2d_oomv2_800| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/abbassix/2d_oomv2_800 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-2d_psn_1600_en.md b/docs/_posts/ahmedlone127/2024-09-25-2d_psn_1600_en.md new file mode 100644 index 00000000000000..b0b45e0f0eada9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-2d_psn_1600_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English 2d_psn_1600 BertForSequenceClassification from abbassix +author: John Snow Labs +name: 2d_psn_1600 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`2d_psn_1600` is a English model originally trained by abbassix. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/2d_psn_1600_en_5.5.0_3.0_1727276200209.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/2d_psn_1600_en_5.5.0_3.0_1727276200209.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("2d_psn_1600","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("2d_psn_1600", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|2d_psn_1600| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/abbassix/2d_psn_1600 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-acronyms_baseline_vert_correct_clinicalbert_en.md b/docs/_posts/ahmedlone127/2024-09-25-acronyms_baseline_vert_correct_clinicalbert_en.md new file mode 100644 index 00000000000000..4e1e3bd81fc370 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-acronyms_baseline_vert_correct_clinicalbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English acronyms_baseline_vert_correct_clinicalbert BertForSequenceClassification from Wiggily +author: John Snow Labs +name: acronyms_baseline_vert_correct_clinicalbert +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`acronyms_baseline_vert_correct_clinicalbert` is a English model originally trained by Wiggily. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/acronyms_baseline_vert_correct_clinicalbert_en_5.5.0_3.0_1727245392430.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/acronyms_baseline_vert_correct_clinicalbert_en_5.5.0_3.0_1727245392430.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("acronyms_baseline_vert_correct_clinicalbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("acronyms_baseline_vert_correct_clinicalbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|acronyms_baseline_vert_correct_clinicalbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.5 MB| + +## References + +https://huggingface.co/Wiggily/acronyms_baseline_vert_correct_clinicalbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-adrv2024_markadamsmsba24_en.md b/docs/_posts/ahmedlone127/2024-09-25-adrv2024_markadamsmsba24_en.md new file mode 100644 index 00000000000000..cd1b83272e7931 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-adrv2024_markadamsmsba24_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English adrv2024_markadamsmsba24 BertForSequenceClassification from MarkAdamsMSBA24 +author: John Snow Labs +name: adrv2024_markadamsmsba24 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`adrv2024_markadamsmsba24` is a English model originally trained by MarkAdamsMSBA24. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/adrv2024_markadamsmsba24_en_5.5.0_3.0_1727267305241.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/adrv2024_markadamsmsba24_en_5.5.0_3.0_1727267305241.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("adrv2024_markadamsmsba24","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("adrv2024_markadamsmsba24", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|adrv2024_markadamsmsba24| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/MarkAdamsMSBA24/ADRv2024 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-adrv2024_paragon_analytics_en.md b/docs/_posts/ahmedlone127/2024-09-25-adrv2024_paragon_analytics_en.md new file mode 100644 index 00000000000000..ae86905aca6194 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-adrv2024_paragon_analytics_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English adrv2024_paragon_analytics BertForSequenceClassification from paragon-analytics +author: John Snow Labs +name: adrv2024_paragon_analytics +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`adrv2024_paragon_analytics` is a English model originally trained by paragon-analytics. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/adrv2024_paragon_analytics_en_5.5.0_3.0_1727268598866.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/adrv2024_paragon_analytics_en_5.5.0_3.0_1727268598866.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("adrv2024_paragon_analytics","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("adrv2024_paragon_analytics", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|adrv2024_paragon_analytics| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/paragon-analytics/ADRv2024 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-adrv2024_paragon_analytics_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-adrv2024_paragon_analytics_pipeline_en.md new file mode 100644 index 00000000000000..4a5da165abfb4c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-adrv2024_paragon_analytics_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English adrv2024_paragon_analytics_pipeline pipeline BertForSequenceClassification from paragon-analytics +author: John Snow Labs +name: adrv2024_paragon_analytics_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`adrv2024_paragon_analytics_pipeline` is a English model originally trained by paragon-analytics. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/adrv2024_paragon_analytics_pipeline_en_5.5.0_3.0_1727268620346.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/adrv2024_paragon_analytics_pipeline_en_5.5.0_3.0_1727268620346.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("adrv2024_paragon_analytics_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("adrv2024_paragon_analytics_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|adrv2024_paragon_analytics_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/paragon-analytics/ADRv2024 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-advance_bert_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-advance_bert_classification_pipeline_en.md new file mode 100644 index 00000000000000..7a5e8d4a1b45bf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-advance_bert_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English advance_bert_classification_pipeline pipeline BertForSequenceClassification from Kurkur99 +author: John Snow Labs +name: advance_bert_classification_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`advance_bert_classification_pipeline` is a English model originally trained by Kurkur99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/advance_bert_classification_pipeline_en_5.5.0_3.0_1727269957471.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/advance_bert_classification_pipeline_en_5.5.0_3.0_1727269957471.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("advance_bert_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("advance_bert_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|advance_bert_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.1 MB| + +## References + +https://huggingface.co/Kurkur99/Advance_Bert_Classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-aes_bert_base_sp90_lr1e_05_wr1e_01_wd1e_02_ep15_elsa_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-aes_bert_base_sp90_lr1e_05_wr1e_01_wd1e_02_ep15_elsa_pipeline_en.md new file mode 100644 index 00000000000000..8a475e9031f6ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-aes_bert_base_sp90_lr1e_05_wr1e_01_wd1e_02_ep15_elsa_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English aes_bert_base_sp90_lr1e_05_wr1e_01_wd1e_02_ep15_elsa_pipeline pipeline BertForSequenceClassification from ys7yoo +author: John Snow Labs +name: aes_bert_base_sp90_lr1e_05_wr1e_01_wd1e_02_ep15_elsa_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`aes_bert_base_sp90_lr1e_05_wr1e_01_wd1e_02_ep15_elsa_pipeline` is a English model originally trained by ys7yoo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/aes_bert_base_sp90_lr1e_05_wr1e_01_wd1e_02_ep15_elsa_pipeline_en_5.5.0_3.0_1727287956799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/aes_bert_base_sp90_lr1e_05_wr1e_01_wd1e_02_ep15_elsa_pipeline_en_5.5.0_3.0_1727287956799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("aes_bert_base_sp90_lr1e_05_wr1e_01_wd1e_02_ep15_elsa_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("aes_bert_base_sp90_lr1e_05_wr1e_01_wd1e_02_ep15_elsa_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|aes_bert_base_sp90_lr1e_05_wr1e_01_wd1e_02_ep15_elsa_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.6 MB| + +## References + +https://huggingface.co/ys7yoo/aes_bert-base_sp90_lr1e-05_wr1e-01_wd1e-02_ep15_elsa + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-aes_enem_models_sourcea_regression_from_bertimbau_large_c5_en.md b/docs/_posts/ahmedlone127/2024-09-25-aes_enem_models_sourcea_regression_from_bertimbau_large_c5_en.md new file mode 100644 index 00000000000000..239ed97a46214c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-aes_enem_models_sourcea_regression_from_bertimbau_large_c5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English aes_enem_models_sourcea_regression_from_bertimbau_large_c5 BertForSequenceClassification from kamel-usp +author: John Snow Labs +name: aes_enem_models_sourcea_regression_from_bertimbau_large_c5 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`aes_enem_models_sourcea_regression_from_bertimbau_large_c5` is a English model originally trained by kamel-usp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/aes_enem_models_sourcea_regression_from_bertimbau_large_c5_en_5.5.0_3.0_1727261656934.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/aes_enem_models_sourcea_regression_from_bertimbau_large_c5_en_5.5.0_3.0_1727261656934.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("aes_enem_models_sourcea_regression_from_bertimbau_large_c5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("aes_enem_models_sourcea_regression_from_bertimbau_large_c5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|aes_enem_models_sourcea_regression_from_bertimbau_large_c5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/kamel-usp/aes_enem_models-sourceA-regression-from-bertimbau-large-C5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ag_news_38400_bert_base_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-25-ag_news_38400_bert_base_uncased_en.md new file mode 100644 index 00000000000000..0f8a792179dc19 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ag_news_38400_bert_base_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ag_news_38400_bert_base_uncased BertForSequenceClassification from Kyle1668 +author: John Snow Labs +name: ag_news_38400_bert_base_uncased +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ag_news_38400_bert_base_uncased` is a English model originally trained by Kyle1668. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ag_news_38400_bert_base_uncased_en_5.5.0_3.0_1727222427253.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ag_news_38400_bert_base_uncased_en_5.5.0_3.0_1727222427253.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("ag_news_38400_bert_base_uncased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("ag_news_38400_bert_base_uncased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ag_news_38400_bert_base_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Kyle1668/ag-news-38400-bert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-alberti_stanzas_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-alberti_stanzas_pipeline_en.md new file mode 100644 index 00000000000000..b99a40fc10fd87 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-alberti_stanzas_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English alberti_stanzas_pipeline pipeline BertForSequenceClassification from alvp +author: John Snow Labs +name: alberti_stanzas_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`alberti_stanzas_pipeline` is a English model originally trained by alvp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/alberti_stanzas_pipeline_en_5.5.0_3.0_1727267423790.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/alberti_stanzas_pipeline_en_5.5.0_3.0_1727267423790.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("alberti_stanzas_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("alberti_stanzas_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|alberti_stanzas_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|666.7 MB| + +## References + +https://huggingface.co/alvp/alberti-stanzas + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-albertv2_dc_unsorted_dec_cf_en.md b/docs/_posts/ahmedlone127/2024-09-25-albertv2_dc_unsorted_dec_cf_en.md new file mode 100644 index 00000000000000..c60a873920f9fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-albertv2_dc_unsorted_dec_cf_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English albertv2_dc_unsorted_dec_cf BertForSequenceClassification from rpii2023 +author: John Snow Labs +name: albertv2_dc_unsorted_dec_cf +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albertv2_dc_unsorted_dec_cf` is a English model originally trained by rpii2023. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albertv2_dc_unsorted_dec_cf_en_5.5.0_3.0_1727239459657.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albertv2_dc_unsorted_dec_cf_en_5.5.0_3.0_1727239459657.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("albertv2_dc_unsorted_dec_cf","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("albertv2_dc_unsorted_dec_cf", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albertv2_dc_unsorted_dec_cf| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/rpii2023/albertv2_DC_unsorted_DEC_CF \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-albertv2_dc_unsorted_dec_cf_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-albertv2_dc_unsorted_dec_cf_pipeline_en.md new file mode 100644 index 00000000000000..d9485925e26a78 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-albertv2_dc_unsorted_dec_cf_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English albertv2_dc_unsorted_dec_cf_pipeline pipeline BertForSequenceClassification from rpii2023 +author: John Snow Labs +name: albertv2_dc_unsorted_dec_cf_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`albertv2_dc_unsorted_dec_cf_pipeline` is a English model originally trained by rpii2023. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/albertv2_dc_unsorted_dec_cf_pipeline_en_5.5.0_3.0_1727239484248.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/albertv2_dc_unsorted_dec_cf_pipeline_en_5.5.0_3.0_1727239484248.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("albertv2_dc_unsorted_dec_cf_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("albertv2_dc_unsorted_dec_cf_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|albertv2_dc_unsorted_dec_cf_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/rpii2023/albertv2_DC_unsorted_DEC_CF + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-alsval_en.md b/docs/_posts/ahmedlone127/2024-09-25-alsval_en.md new file mode 100644 index 00000000000000..cb341ee6e114b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-alsval_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English alsval BertForSequenceClassification from yeamerci +author: John Snow Labs +name: alsval +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`alsval` is a English model originally trained by yeamerci. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/alsval_en_5.5.0_3.0_1727268126325.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/alsval_en_5.5.0_3.0_1727268126325.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("alsval","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("alsval", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|alsval| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|664.5 MB| + +## References + +https://huggingface.co/yeamerci/alsval \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-alsval_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-alsval_pipeline_en.md new file mode 100644 index 00000000000000..1ba604632e3b72 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-alsval_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English alsval_pipeline pipeline BertForSequenceClassification from yeamerci +author: John Snow Labs +name: alsval_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`alsval_pipeline` is a English model originally trained by yeamerci. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/alsval_pipeline_en_5.5.0_3.0_1727268162233.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/alsval_pipeline_en_5.5.0_3.0_1727268162233.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("alsval_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("alsval_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|alsval_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|664.5 MB| + +## References + +https://huggingface.co/yeamerci/alsval + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_2_en.md b/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_2_en.md new file mode 100644 index 00000000000000..9e970332a74bf2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English amir_clinicalbert_2 BertForTokenClassification from amirali26 +author: John Snow Labs +name: amir_clinicalbert_2 +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amir_clinicalbert_2` is a English model originally trained by amirali26. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amir_clinicalbert_2_en_5.5.0_3.0_1727282119447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amir_clinicalbert_2_en_5.5.0_3.0_1727282119447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("amir_clinicalbert_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("amir_clinicalbert_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amir_clinicalbert_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.3 MB| + +## References + +https://huggingface.co/amirali26/amir-clinicalbert-2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_2_pipeline_en.md new file mode 100644 index 00000000000000..413501e142efc7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English amir_clinicalbert_2_pipeline pipeline BertForTokenClassification from amirali26 +author: John Snow Labs +name: amir_clinicalbert_2_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amir_clinicalbert_2_pipeline` is a English model originally trained by amirali26. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amir_clinicalbert_2_pipeline_en_5.5.0_3.0_1727282140590.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amir_clinicalbert_2_pipeline_en_5.5.0_3.0_1727282140590.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("amir_clinicalbert_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("amir_clinicalbert_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amir_clinicalbert_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.4 MB| + +## References + +https://huggingface.co/amirali26/amir-clinicalbert-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_specialities_en.md b/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_specialities_en.md new file mode 100644 index 00000000000000..d0ad7550d1187f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_specialities_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English amir_clinicalbert_specialities BertForTokenClassification from amirali26 +author: John Snow Labs +name: amir_clinicalbert_specialities +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amir_clinicalbert_specialities` is a English model originally trained by amirali26. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amir_clinicalbert_specialities_en_5.5.0_3.0_1727260708867.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amir_clinicalbert_specialities_en_5.5.0_3.0_1727260708867.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("amir_clinicalbert_specialities","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("amir_clinicalbert_specialities", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amir_clinicalbert_specialities| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.3 MB| + +## References + +https://huggingface.co/amirali26/amir-clinicalbert-specialities \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_specialities_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_specialities_pipeline_en.md new file mode 100644 index 00000000000000..1721fd482c3bbe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-amir_clinicalbert_specialities_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English amir_clinicalbert_specialities_pipeline pipeline BertForTokenClassification from amirali26 +author: John Snow Labs +name: amir_clinicalbert_specialities_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`amir_clinicalbert_specialities_pipeline` is a English model originally trained by amirali26. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/amir_clinicalbert_specialities_pipeline_en_5.5.0_3.0_1727260729965.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/amir_clinicalbert_specialities_pipeline_en_5.5.0_3.0_1727260729965.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("amir_clinicalbert_specialities_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("amir_clinicalbert_specialities_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|amir_clinicalbert_specialities_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.4 MB| + +## References + +https://huggingface.co/amirali26/amir-clinicalbert-specialities + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-anglicisms_spanish_beto_es.md b/docs/_posts/ahmedlone127/2024-09-25-anglicisms_spanish_beto_es.md new file mode 100644 index 00000000000000..0124c95d597974 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-anglicisms_spanish_beto_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish anglicisms_spanish_beto BertForTokenClassification from lirondos +author: John Snow Labs +name: anglicisms_spanish_beto +date: 2024-09-25 +tags: [es, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`anglicisms_spanish_beto` is a Castilian, Spanish model originally trained by lirondos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/anglicisms_spanish_beto_es_5.5.0_3.0_1727249839024.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/anglicisms_spanish_beto_es_5.5.0_3.0_1727249839024.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("anglicisms_spanish_beto","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("anglicisms_spanish_beto", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|anglicisms_spanish_beto| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|409.5 MB| + +## References + +https://huggingface.co/lirondos/anglicisms-spanish-beto \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-arqmath_bert_base_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-arqmath_bert_base_cased_en.md new file mode 100644 index 00000000000000..019ae9d313bd75 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-arqmath_bert_base_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English arqmath_bert_base_cased BertForSequenceClassification from malteos +author: John Snow Labs +name: arqmath_bert_base_cased +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`arqmath_bert_base_cased` is a English model originally trained by malteos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/arqmath_bert_base_cased_en_5.5.0_3.0_1727273232573.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/arqmath_bert_base_cased_en_5.5.0_3.0_1727273232573.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("arqmath_bert_base_cased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("arqmath_bert_base_cased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|arqmath_bert_base_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.7 MB| + +## References + +https://huggingface.co/malteos/arqmath-bert-base-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-authorparsermodel_de.md b/docs/_posts/ahmedlone127/2024-09-25-authorparsermodel_de.md new file mode 100644 index 00000000000000..1d10fbc22c7ade --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-authorparsermodel_de.md @@ -0,0 +1,94 @@ +--- +layout: model +title: German authorparsermodel BertForTokenClassification from GEOcite +author: John Snow Labs +name: authorparsermodel +date: 2024-09-25 +tags: [de, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`authorparsermodel` is a German model originally trained by GEOcite. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/authorparsermodel_de_5.5.0_3.0_1727280916608.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/authorparsermodel_de_5.5.0_3.0_1727280916608.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("authorparsermodel","de") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("authorparsermodel", "de") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|authorparsermodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|625.5 MB| + +## References + +https://huggingface.co/GEOcite/AuthorParserModel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-autotrain_bertbase_imdb_1275748792_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-autotrain_bertbase_imdb_1275748792_pipeline_en.md new file mode 100644 index 00000000000000..5758eea8fb0bfc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-autotrain_bertbase_imdb_1275748792_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autotrain_bertbase_imdb_1275748792_pipeline pipeline BertForSequenceClassification from sasha +author: John Snow Labs +name: autotrain_bertbase_imdb_1275748792_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_bertbase_imdb_1275748792_pipeline` is a English model originally trained by sasha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_bertbase_imdb_1275748792_pipeline_en_5.5.0_3.0_1727277487477.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_bertbase_imdb_1275748792_pipeline_en_5.5.0_3.0_1727277487477.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_bertbase_imdb_1275748792_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_bertbase_imdb_1275748792_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_bertbase_imdb_1275748792_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/sasha/autotrain-BERTBase-imdb-1275748792 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-autotrain_bertbase_imdb_1275748793_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-autotrain_bertbase_imdb_1275748793_pipeline_en.md new file mode 100644 index 00000000000000..6af0cba49af530 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-autotrain_bertbase_imdb_1275748793_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English autotrain_bertbase_imdb_1275748793_pipeline pipeline BertForSequenceClassification from sasha +author: John Snow Labs +name: autotrain_bertbase_imdb_1275748793_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`autotrain_bertbase_imdb_1275748793_pipeline` is a English model originally trained by sasha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/autotrain_bertbase_imdb_1275748793_pipeline_en_5.5.0_3.0_1727284881697.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/autotrain_bertbase_imdb_1275748793_pipeline_en_5.5.0_3.0_1727284881697.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("autotrain_bertbase_imdb_1275748793_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("autotrain_bertbase_imdb_1275748793_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|autotrain_bertbase_imdb_1275748793_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/sasha/autotrain-BERTBase-imdb-1275748793 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-banglabert_generator_bn.md b/docs/_posts/ahmedlone127/2024-09-25-banglabert_generator_bn.md new file mode 100644 index 00000000000000..3ea9e9f0f5d7a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-banglabert_generator_bn.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Bengali banglabert_generator BertEmbeddings from csebuetnlp +author: John Snow Labs +name: banglabert_generator +date: 2024-09-25 +tags: [bn, open_source, onnx, embeddings, bert] +task: Embeddings +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`banglabert_generator` is a Bengali model originally trained by csebuetnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/banglabert_generator_bn_5.5.0_3.0_1727240835855.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/banglabert_generator_bn_5.5.0_3.0_1727240835855.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("banglabert_generator","bn") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("banglabert_generator","bn") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|banglabert_generator| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|bn| +|Size:|130.0 MB| + +## References + +https://huggingface.co/csebuetnlp/banglabert_generator \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-banglabert_generator_pipeline_bn.md b/docs/_posts/ahmedlone127/2024-09-25-banglabert_generator_pipeline_bn.md new file mode 100644 index 00000000000000..bcd28666242de2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-banglabert_generator_pipeline_bn.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Bengali banglabert_generator_pipeline pipeline BertEmbeddings from csebuetnlp +author: John Snow Labs +name: banglabert_generator_pipeline +date: 2024-09-25 +tags: [bn, open_source, pipeline, onnx] +task: Embeddings +language: bn +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`banglabert_generator_pipeline` is a Bengali model originally trained by csebuetnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/banglabert_generator_pipeline_bn_5.5.0_3.0_1727240842238.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/banglabert_generator_pipeline_bn_5.5.0_3.0_1727240842238.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("banglabert_generator_pipeline", lang = "bn") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("banglabert_generator_pipeline", lang = "bn") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|banglabert_generator_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bn| +|Size:|130.0 MB| + +## References + +https://huggingface.co/csebuetnlp/banglabert_generator + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-base_bert_finetuned_mtsamples_en.md b/docs/_posts/ahmedlone127/2024-09-25-base_bert_finetuned_mtsamples_en.md new file mode 100644 index 00000000000000..f1693ed31295b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-base_bert_finetuned_mtsamples_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English base_bert_finetuned_mtsamples BertForSequenceClassification from mnaylor +author: John Snow Labs +name: base_bert_finetuned_mtsamples +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_bert_finetuned_mtsamples` is a English model originally trained by mnaylor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_bert_finetuned_mtsamples_en_5.5.0_3.0_1727276195937.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_bert_finetuned_mtsamples_en_5.5.0_3.0_1727276195937.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("base_bert_finetuned_mtsamples","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("base_bert_finetuned_mtsamples", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_bert_finetuned_mtsamples| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/mnaylor/base-bert-finetuned-mtsamples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-base_bert_finetuned_mtsamples_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-base_bert_finetuned_mtsamples_pipeline_en.md new file mode 100644 index 00000000000000..017d8c81915876 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-base_bert_finetuned_mtsamples_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English base_bert_finetuned_mtsamples_pipeline pipeline BertForSequenceClassification from mnaylor +author: John Snow Labs +name: base_bert_finetuned_mtsamples_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`base_bert_finetuned_mtsamples_pipeline` is a English model originally trained by mnaylor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/base_bert_finetuned_mtsamples_pipeline_en_5.5.0_3.0_1727276218426.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/base_bert_finetuned_mtsamples_pipeline_en_5.5.0_3.0_1727276218426.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("base_bert_finetuned_mtsamples_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("base_bert_finetuned_mtsamples_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|base_bert_finetuned_mtsamples_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/mnaylor/base-bert-finetuned-mtsamples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_amazon_product_classification_small_data_epoch_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_amazon_product_classification_small_data_epoch_2_pipeline_en.md new file mode 100644 index 00000000000000..8199ee9276a584 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_amazon_product_classification_small_data_epoch_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_amazon_product_classification_small_data_epoch_2_pipeline pipeline BertForSequenceClassification from nthieu +author: John Snow Labs +name: bert_amazon_product_classification_small_data_epoch_2_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_amazon_product_classification_small_data_epoch_2_pipeline` is a English model originally trained by nthieu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_amazon_product_classification_small_data_epoch_2_pipeline_en_5.5.0_3.0_1727288372872.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_amazon_product_classification_small_data_epoch_2_pipeline_en_5.5.0_3.0_1727288372872.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_amazon_product_classification_small_data_epoch_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_amazon_product_classification_small_data_epoch_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_amazon_product_classification_small_data_epoch_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/nthieu/bert-amazon-product-classification-small-data-epoch-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_andriydovgal_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_andriydovgal_en.md new file mode 100644 index 00000000000000..d4728bf5cd7f64 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_andriydovgal_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_banking77_pt2_andriydovgal BertForSequenceClassification from andriydovgal +author: John Snow Labs +name: bert_base_banking77_pt2_andriydovgal +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_banking77_pt2_andriydovgal` is a English model originally trained by andriydovgal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_banking77_pt2_andriydovgal_en_5.5.0_3.0_1727267267008.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_banking77_pt2_andriydovgal_en_5.5.0_3.0_1727267267008.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_banking77_pt2_andriydovgal","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_banking77_pt2_andriydovgal", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_banking77_pt2_andriydovgal| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.6 MB| + +## References + +https://huggingface.co/andriydovgal/bert-base-banking77-pt2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_bakuretso_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_bakuretso_en.md new file mode 100644 index 00000000000000..367bd93b5a6379 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_bakuretso_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_banking77_pt2_bakuretso BertForSequenceClassification from Bakuretso +author: John Snow Labs +name: bert_base_banking77_pt2_bakuretso +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_banking77_pt2_bakuretso` is a English model originally trained by Bakuretso. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_banking77_pt2_bakuretso_en_5.5.0_3.0_1727266221799.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_banking77_pt2_bakuretso_en_5.5.0_3.0_1727266221799.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_banking77_pt2_bakuretso","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_banking77_pt2_bakuretso", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_banking77_pt2_bakuretso| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.6 MB| + +## References + +https://huggingface.co/Bakuretso/bert-base-banking77-pt2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_dangdana_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_dangdana_pipeline_en.md new file mode 100644 index 00000000000000..f8b1e265acd0b9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_dangdana_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_banking77_pt2_dangdana_pipeline pipeline BertForSequenceClassification from dangdana +author: John Snow Labs +name: bert_base_banking77_pt2_dangdana_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_banking77_pt2_dangdana_pipeline` is a English model originally trained by dangdana. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_banking77_pt2_dangdana_pipeline_en_5.5.0_3.0_1727268699035.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_banking77_pt2_dangdana_pipeline_en_5.5.0_3.0_1727268699035.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_banking77_pt2_dangdana_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_banking77_pt2_dangdana_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_banking77_pt2_dangdana_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.6 MB| + +## References + +https://huggingface.co/dangdana/bert-base-banking77-pt2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_psj0919_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_psj0919_pipeline_en.md new file mode 100644 index 00000000000000..b562bbe1f78ca6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_psj0919_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_banking77_pt2_psj0919_pipeline pipeline BertForSequenceClassification from psj0919 +author: John Snow Labs +name: bert_base_banking77_pt2_psj0919_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_banking77_pt2_psj0919_pipeline` is a English model originally trained by psj0919. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_banking77_pt2_psj0919_pipeline_en_5.5.0_3.0_1727266525860.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_banking77_pt2_psj0919_pipeline_en_5.5.0_3.0_1727266525860.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_banking77_pt2_psj0919_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_banking77_pt2_psj0919_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_banking77_pt2_psj0919_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.6 MB| + +## References + +https://huggingface.co/psj0919/bert-base-banking77-pt2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_tonyla25_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_tonyla25_en.md new file mode 100644 index 00000000000000..1917bfc577377d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_banking77_pt2_tonyla25_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_banking77_pt2_tonyla25 BertForSequenceClassification from tonyla25 +author: John Snow Labs +name: bert_base_banking77_pt2_tonyla25 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_banking77_pt2_tonyla25` is a English model originally trained by tonyla25. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_banking77_pt2_tonyla25_en_5.5.0_3.0_1727268905496.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_banking77_pt2_tonyla25_en_5.5.0_3.0_1727268905496.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_banking77_pt2_tonyla25","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_banking77_pt2_tonyla25", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_banking77_pt2_tonyla25| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|8.7 MB| + +## References + +https://huggingface.co/tonyla25/bert-base-banking77-pt2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_bookcorpus_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_bookcorpus_en.md new file mode 100644 index 00000000000000..f5901fc9d00347 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_bookcorpus_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: English bert_base_bookcorpus BertEmbeddings from nicholasKluge +author: John Snow Labs +name: bert_base_bookcorpus +date: 2024-09-25 +tags: [bert, en, open_source, fill_mask, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_bookcorpus` is a English model originally trained by nicholasKluge. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_bookcorpus_en_5.5.0_3.0_1727240901527.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_bookcorpus_en_5.5.0_3.0_1727240901527.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("documents") + + +embeddings =BertEmbeddings.pretrained("bert_base_bookcorpus","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([document_assembler, embeddings]) + +pipelineModel = pipeline.fit(data) + +pipelineDF = pipelineModel.transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("embeddings") + +val embeddings = BertEmbeddings + .pretrained("bert_base_bookcorpus", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(document_assembler, embeddings)) + +val pipelineModel = pipeline.fit(data) + +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_bookcorpus| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|406.3 MB| + +## References + +References + +https://huggingface.co/nicholasKluge/bert-base-bookcorpus \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_buddhist_sanskrit_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_buddhist_sanskrit_en.md new file mode 100644 index 00000000000000..2716ceae4cf0ea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_buddhist_sanskrit_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_buddhist_sanskrit BertEmbeddings from Matej +author: John Snow Labs +name: bert_base_buddhist_sanskrit +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_buddhist_sanskrit` is a English model originally trained by Matej. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_buddhist_sanskrit_en_5.5.0_3.0_1727254998263.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_buddhist_sanskrit_en_5.5.0_3.0_1727254998263.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_buddhist_sanskrit","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_buddhist_sanskrit","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_buddhist_sanskrit| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/Matej/bert-base-buddhist-sanskrit \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_case_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_case_ner_pipeline_en.md new file mode 100644 index 00000000000000..bc158d8d5a7486 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_case_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_case_ner_pipeline pipeline BertForTokenClassification from raulgdp +author: John Snow Labs +name: bert_base_case_ner_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_case_ner_pipeline` is a English model originally trained by raulgdp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_case_ner_pipeline_en_5.5.0_3.0_1727280259153.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_case_ner_pipeline_en_5.5.0_3.0_1727280259153.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_case_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_case_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_case_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/raulgdp/bert-base-case-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_0210_celential_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_0210_celential_en.md new file mode 100644 index 00000000000000..6434d9ea67151b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_0210_celential_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_cased_0210_celential BertForSequenceClassification from feiyangDu +author: John Snow Labs +name: bert_base_cased_0210_celential +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_0210_celential` is a English model originally trained by feiyangDu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_0210_celential_en_5.5.0_3.0_1727285678748.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_0210_celential_en_5.5.0_3.0_1727285678748.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_0210_celential","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_0210_celential", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_0210_celential| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/feiyangDu/bert-base-cased-0210-celential \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_cola_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_cola_en.md new file mode 100644 index 00000000000000..075ad379cfff85 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_cola_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_cased_cola BertForSequenceClassification from gmihaila +author: John Snow Labs +name: bert_base_cased_cola +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_cola` is a English model originally trained by gmihaila. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_cola_en_5.5.0_3.0_1727287559197.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_cola_en_5.5.0_3.0_1727287559197.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_cola","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_cased_cola", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_cola| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/gmihaila/bert-base-cased-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_english_sentweet_derogatory_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_english_sentweet_derogatory_pipeline_en.md new file mode 100644 index 00000000000000..8c203e81d778fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_english_sentweet_derogatory_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_cased_english_sentweet_derogatory_pipeline pipeline BertForSequenceClassification from jayanta +author: John Snow Labs +name: bert_base_cased_english_sentweet_derogatory_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_english_sentweet_derogatory_pipeline` is a English model originally trained by jayanta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_english_sentweet_derogatory_pipeline_en_5.5.0_3.0_1727288791450.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_english_sentweet_derogatory_pipeline_en_5.5.0_3.0_1727288791450.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_english_sentweet_derogatory_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_english_sentweet_derogatory_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_english_sentweet_derogatory_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/jayanta/bert-base-cased-english-sentweet-Derogatory + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_finetuned_ner_bc2gm_iob_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_finetuned_ner_bc2gm_iob_en.md new file mode 100644 index 00000000000000..473d6fc34f6ca8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_finetuned_ner_bc2gm_iob_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_cased_finetuned_ner_bc2gm_iob BertForTokenClassification from DunnBC22 +author: John Snow Labs +name: bert_base_cased_finetuned_ner_bc2gm_iob +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_finetuned_ner_bc2gm_iob` is a English model originally trained by DunnBC22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_ner_bc2gm_iob_en_5.5.0_3.0_1727284110801.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_finetuned_ner_bc2gm_iob_en_5.5.0_3.0_1727284110801.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_cased_finetuned_ner_bc2gm_iob","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_cased_finetuned_ner_bc2gm_iob", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_finetuned_ner_bc2gm_iob| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/DunnBC22/bert-base-cased-finetuned-ner-BC2GM-IOB \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_textcls_rheology_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_textcls_rheology_pipeline_en.md new file mode 100644 index 00000000000000..69490b4cda4fd0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_cased_textcls_rheology_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_cased_textcls_rheology_pipeline pipeline BertForSequenceClassification from jonas-luehrs +author: John Snow Labs +name: bert_base_cased_textcls_rheology_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_cased_textcls_rheology_pipeline` is a English model originally trained by jonas-luehrs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_cased_textcls_rheology_pipeline_en_5.5.0_3.0_1727272973625.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_cased_textcls_rheology_pipeline_en_5.5.0_3.0_1727272973625.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_cased_textcls_rheology_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_cased_textcls_rheology_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_cased_textcls_rheology_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/jonas-luehrs/bert-base-cased-textCLS-RHEOLOGY + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_casedepoch3_sexist_baseline_with_reddit_and_gabfortest_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_casedepoch3_sexist_baseline_with_reddit_and_gabfortest_pipeline_en.md new file mode 100644 index 00000000000000..8f684b6dfa984b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_casedepoch3_sexist_baseline_with_reddit_and_gabfortest_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_casedepoch3_sexist_baseline_with_reddit_and_gabfortest_pipeline pipeline BertForSequenceClassification from Wiebke +author: John Snow Labs +name: bert_base_casedepoch3_sexist_baseline_with_reddit_and_gabfortest_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_casedepoch3_sexist_baseline_with_reddit_and_gabfortest_pipeline` is a English model originally trained by Wiebke. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_casedepoch3_sexist_baseline_with_reddit_and_gabfortest_pipeline_en_5.5.0_3.0_1727284527197.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_casedepoch3_sexist_baseline_with_reddit_and_gabfortest_pipeline_en_5.5.0_3.0_1727284527197.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_casedepoch3_sexist_baseline_with_reddit_and_gabfortest_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_casedepoch3_sexist_baseline_with_reddit_and_gabfortest_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_casedepoch3_sexist_baseline_with_reddit_and_gabfortest_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Wiebke/bert-base-casedepoch3_sexist_baseline_with_reddit_and_gabfortest + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_chinese_climate_risk_opportunity_prediction_v4_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_chinese_climate_risk_opportunity_prediction_v4_en.md new file mode 100644 index 00000000000000..a8ec49ea35b2de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_chinese_climate_risk_opportunity_prediction_v4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_chinese_climate_risk_opportunity_prediction_v4 BertForSequenceClassification from hw2942 +author: John Snow Labs +name: bert_base_chinese_climate_risk_opportunity_prediction_v4 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_climate_risk_opportunity_prediction_v4` is a English model originally trained by hw2942. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_climate_risk_opportunity_prediction_v4_en_5.5.0_3.0_1727285677709.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_climate_risk_opportunity_prediction_v4_en_5.5.0_3.0_1727285677709.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_chinese_climate_risk_opportunity_prediction_v4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_chinese_climate_risk_opportunity_prediction_v4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_climate_risk_opportunity_prediction_v4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|383.3 MB| + +## References + +https://huggingface.co/hw2942/bert-base-chinese-climate-risk-opportunity-prediction-v4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_chinese_finetuning_financial_news_sentiment_zh.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_chinese_finetuning_financial_news_sentiment_zh.md new file mode 100644 index 00000000000000..e335614831ec3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_chinese_finetuning_financial_news_sentiment_zh.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Chinese bert_base_chinese_finetuning_financial_news_sentiment BertForSequenceClassification from hw2942 +author: John Snow Labs +name: bert_base_chinese_finetuning_financial_news_sentiment +date: 2024-09-25 +tags: [zh, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_chinese_finetuning_financial_news_sentiment` is a Chinese model originally trained by hw2942. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuning_financial_news_sentiment_zh_5.5.0_3.0_1727279222615.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_chinese_finetuning_financial_news_sentiment_zh_5.5.0_3.0_1727279222615.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_chinese_finetuning_financial_news_sentiment","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_chinese_finetuning_financial_news_sentiment", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_chinese_finetuning_financial_news_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|zh| +|Size:|383.3 MB| + +## References + +https://huggingface.co/hw2942/bert-base-chinese-finetuning-financial-news-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_code_classification_mid_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_code_classification_mid_pipeline_en.md new file mode 100644 index 00000000000000..cdc5e6e19a3242 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_code_classification_mid_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_finetuned_code_classification_mid_pipeline pipeline BertForSequenceClassification from JUNstats +author: John Snow Labs +name: bert_base_finetuned_code_classification_mid_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_finetuned_code_classification_mid_pipeline` is a English model originally trained by JUNstats. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_code_classification_mid_pipeline_en_5.5.0_3.0_1727286164330.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_code_classification_mid_pipeline_en_5.5.0_3.0_1727286164330.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_finetuned_code_classification_mid_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_finetuned_code_classification_mid_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_finetuned_code_classification_mid_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.8 MB| + +## References + +https://huggingface.co/JUNstats/bert-base-finetuned-code-classification-mid + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_masakhaner_amh_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_masakhaner_amh_en.md new file mode 100644 index 00000000000000..d07be6cb1cf7ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_masakhaner_amh_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_finetuned_masakhaner_amh BertForTokenClassification from TokenfreeEMNLPSubmission +author: John Snow Labs +name: bert_base_finetuned_masakhaner_amh +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_finetuned_masakhaner_amh` is a English model originally trained by TokenfreeEMNLPSubmission. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_masakhaner_amh_en_5.5.0_3.0_1727283797535.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_masakhaner_amh_en_5.5.0_3.0_1727283797535.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_finetuned_masakhaner_amh","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_finetuned_masakhaner_amh", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_finetuned_masakhaner_amh| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/TokenfreeEMNLPSubmission/bert-base-finetuned-masakhaner-amh \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_masakhaner_amh_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_masakhaner_amh_pipeline_en.md new file mode 100644 index 00000000000000..faa43e947fa524 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_masakhaner_amh_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_finetuned_masakhaner_amh_pipeline pipeline BertForTokenClassification from TokenfreeEMNLPSubmission +author: John Snow Labs +name: bert_base_finetuned_masakhaner_amh_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_finetuned_masakhaner_amh_pipeline` is a English model originally trained by TokenfreeEMNLPSubmission. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_masakhaner_amh_pipeline_en_5.5.0_3.0_1727283818894.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_masakhaner_amh_pipeline_en_5.5.0_3.0_1727283818894.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_finetuned_masakhaner_amh_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_finetuned_masakhaner_amh_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_finetuned_masakhaner_amh_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/TokenfreeEMNLPSubmission/bert-base-finetuned-masakhaner-amh + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_sts_rurupang_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_sts_rurupang_en.md new file mode 100644 index 00000000000000..11d22145199121 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_sts_rurupang_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_finetuned_sts_rurupang BertForSequenceClassification from rurupang +author: John Snow Labs +name: bert_base_finetuned_sts_rurupang +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_finetuned_sts_rurupang` is a English model originally trained by rurupang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_sts_rurupang_en_5.5.0_3.0_1727279480961.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_sts_rurupang_en_5.5.0_3.0_1727279480961.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_finetuned_sts_rurupang","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_finetuned_sts_rurupang", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_finetuned_sts_rurupang| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|414.6 MB| + +## References + +https://huggingface.co/rurupang/bert-base-finetuned-sts \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_sts_rurupang_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_sts_rurupang_pipeline_en.md new file mode 100644 index 00000000000000..5d5f7983a81083 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_sts_rurupang_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_finetuned_sts_rurupang_pipeline pipeline BertForSequenceClassification from rurupang +author: John Snow Labs +name: bert_base_finetuned_sts_rurupang_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_finetuned_sts_rurupang_pipeline` is a English model originally trained by rurupang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_sts_rurupang_pipeline_en_5.5.0_3.0_1727279502720.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_sts_rurupang_pipeline_en_5.5.0_3.0_1727279502720.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_finetuned_sts_rurupang_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_finetuned_sts_rurupang_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_finetuned_sts_rurupang_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.7 MB| + +## References + +https://huggingface.co/rurupang/bert-base-finetuned-sts + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_ynat_zgotter_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_ynat_zgotter_en.md new file mode 100644 index 00000000000000..c754609d5786d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_ynat_zgotter_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_finetuned_ynat_zgotter BertForSequenceClassification from zgotter +author: John Snow Labs +name: bert_base_finetuned_ynat_zgotter +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_finetuned_ynat_zgotter` is a English model originally trained by zgotter. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_ynat_zgotter_en_5.5.0_3.0_1727268694344.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_ynat_zgotter_en_5.5.0_3.0_1727268694344.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_finetuned_ynat_zgotter","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_finetuned_ynat_zgotter", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_finetuned_ynat_zgotter| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|414.7 MB| + +## References + +https://huggingface.co/zgotter/bert-base-finetuned-ynat \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_ynat_zgotter_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_ynat_zgotter_pipeline_en.md new file mode 100644 index 00000000000000..e113568d0e3e33 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_finetuned_ynat_zgotter_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_finetuned_ynat_zgotter_pipeline pipeline BertForSequenceClassification from zgotter +author: John Snow Labs +name: bert_base_finetuned_ynat_zgotter_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_finetuned_ynat_zgotter_pipeline` is a English model originally trained by zgotter. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_ynat_zgotter_pipeline_en_5.5.0_3.0_1727268718283.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_finetuned_ynat_zgotter_pipeline_en_5.5.0_3.0_1727268718283.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_finetuned_ynat_zgotter_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_finetuned_ynat_zgotter_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_finetuned_ynat_zgotter_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.7 MB| + +## References + +https://huggingface.co/zgotter/bert-base-finetuned-ynat + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_archaeo_ner_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_archaeo_ner_pipeline_de.md new file mode 100644 index 00000000000000..2c0b1d4905fc3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_archaeo_ner_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German bert_base_german_cased_archaeo_ner_pipeline pipeline BertForTokenClassification from alexbrandsen +author: John Snow Labs +name: bert_base_german_cased_archaeo_ner_pipeline +date: 2024-09-25 +tags: [de, open_source, pipeline, onnx] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_german_cased_archaeo_ner_pipeline` is a German model originally trained by alexbrandsen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_german_cased_archaeo_ner_pipeline_de_5.5.0_3.0_1727246633648.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_german_cased_archaeo_ner_pipeline_de_5.5.0_3.0_1727246633648.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_german_cased_archaeo_ner_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_german_cased_archaeo_ner_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_german_cased_archaeo_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|406.9 MB| + +## References + +https://huggingface.co/alexbrandsen/bert-base-german-cased-archaeo-NER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_finetuned_subj_pretrained_with_noisydata_v2_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_finetuned_subj_pretrained_with_noisydata_v2_en.md new file mode 100644 index 00000000000000..724709b01dd673 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_finetuned_subj_pretrained_with_noisydata_v2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_german_cased_finetuned_subj_pretrained_with_noisydata_v2 BertForTokenClassification from tbosse +author: John Snow Labs +name: bert_base_german_cased_finetuned_subj_pretrained_with_noisydata_v2 +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_german_cased_finetuned_subj_pretrained_with_noisydata_v2` is a English model originally trained by tbosse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_german_cased_finetuned_subj_pretrained_with_noisydata_v2_en_5.5.0_3.0_1727260502352.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_german_cased_finetuned_subj_pretrained_with_noisydata_v2_en_5.5.0_3.0_1727260502352.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_german_cased_finetuned_subj_pretrained_with_noisydata_v2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_german_cased_finetuned_subj_pretrained_with_noisydata_v2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_german_cased_finetuned_subj_pretrained_with_noisydata_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/tbosse/bert-base-german-cased-finetuned-subj_preTrained_with_noisyData_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_finetuned_subj_v6_7epoch_v3_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_finetuned_subj_v6_7epoch_v3_en.md new file mode 100644 index 00000000000000..a1af598a3048f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_finetuned_subj_v6_7epoch_v3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_german_cased_finetuned_subj_v6_7epoch_v3 BertForTokenClassification from tbosse +author: John Snow Labs +name: bert_base_german_cased_finetuned_subj_v6_7epoch_v3 +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_german_cased_finetuned_subj_v6_7epoch_v3` is a English model originally trained by tbosse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_german_cased_finetuned_subj_v6_7epoch_v3_en_5.5.0_3.0_1727284295244.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_german_cased_finetuned_subj_v6_7epoch_v3_en_5.5.0_3.0_1727284295244.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_german_cased_finetuned_subj_v6_7epoch_v3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_german_cased_finetuned_subj_v6_7epoch_v3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_german_cased_finetuned_subj_v6_7epoch_v3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/tbosse/bert-base-german-cased-finetuned-subj_v6_7Epoch_v3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_noisy_pretrain_fine_tuned_v1_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_noisy_pretrain_fine_tuned_v1_2_pipeline_en.md new file mode 100644 index 00000000000000..ddc2658715bda6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_german_cased_noisy_pretrain_fine_tuned_v1_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_german_cased_noisy_pretrain_fine_tuned_v1_2_pipeline pipeline BertForTokenClassification from tbosse +author: John Snow Labs +name: bert_base_german_cased_noisy_pretrain_fine_tuned_v1_2_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_german_cased_noisy_pretrain_fine_tuned_v1_2_pipeline` is a English model originally trained by tbosse. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_german_cased_noisy_pretrain_fine_tuned_v1_2_pipeline_en_5.5.0_3.0_1727280917448.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_german_cased_noisy_pretrain_fine_tuned_v1_2_pipeline_en_5.5.0_3.0_1727280917448.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_german_cased_noisy_pretrain_fine_tuned_v1_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_german_cased_noisy_pretrain_fine_tuned_v1_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_german_cased_noisy_pretrain_fine_tuned_v1_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.9 MB| + +## References + +https://huggingface.co/tbosse/bert-base-german-cased-noisy-pretrain-fine-tuned_v1.2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_italian_xxl_uncased_finetuned_emotions_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_italian_xxl_uncased_finetuned_emotions_en.md new file mode 100644 index 00000000000000..398a31dad69102 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_italian_xxl_uncased_finetuned_emotions_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_italian_xxl_uncased_finetuned_emotions BertForSequenceClassification from MelmaGrigia +author: John Snow Labs +name: bert_base_italian_xxl_uncased_finetuned_emotions +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_italian_xxl_uncased_finetuned_emotions` is a English model originally trained by MelmaGrigia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_italian_xxl_uncased_finetuned_emotions_en_5.5.0_3.0_1727222474327.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_italian_xxl_uncased_finetuned_emotions_en_5.5.0_3.0_1727222474327.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_italian_xxl_uncased_finetuned_emotions","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_italian_xxl_uncased_finetuned_emotions", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_italian_xxl_uncased_finetuned_emotions| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|414.8 MB| + +## References + +https://huggingface.co/MelmaGrigia/bert-base-italian-xxl-uncased-finetuned-emotions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_massive_intent_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_massive_intent_pipeline_en.md new file mode 100644 index 00000000000000..7848de87d384dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_massive_intent_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_massive_intent_pipeline pipeline BertForSequenceClassification from gokuls +author: John Snow Labs +name: bert_base_massive_intent_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_massive_intent_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_massive_intent_pipeline_en_5.5.0_3.0_1727273214064.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_massive_intent_pipeline_en_5.5.0_3.0_1727273214064.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_massive_intent_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_massive_intent_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_massive_intent_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.6 MB| + +## References + +https://huggingface.co/gokuls/bert-base-Massive-intent + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_msmarco_fiqa_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_msmarco_fiqa_en.md new file mode 100644 index 00000000000000..9220161ec6677c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_msmarco_fiqa_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_msmarco_fiqa BertForSequenceClassification from vittoriomaggio +author: John Snow Labs +name: bert_base_msmarco_fiqa +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_msmarco_fiqa` is a English model originally trained by vittoriomaggio. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_msmarco_fiqa_en_5.5.0_3.0_1727273470072.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_msmarco_fiqa_en_5.5.0_3.0_1727273470072.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_msmarco_fiqa","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_msmarco_fiqa", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_msmarco_fiqa| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/vittoriomaggio/bert-base-msmarco-fiqa \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_cased_finetuned_ner_geocorpus_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_cased_finetuned_ner_geocorpus_pipeline_xx.md new file mode 100644 index 00000000000000..f8c863147c5bea --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_cased_finetuned_ner_geocorpus_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_finetuned_ner_geocorpus_pipeline pipeline BertForTokenClassification from GuiTap +author: John Snow Labs +name: bert_base_multilingual_cased_finetuned_ner_geocorpus_pipeline +date: 2024-09-25 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_finetuned_ner_geocorpus_pipeline` is a Multilingual model originally trained by GuiTap. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_ner_geocorpus_pipeline_xx_5.5.0_3.0_1727249902683.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_finetuned_ner_geocorpus_pipeline_xx_5.5.0_3.0_1727249902683.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_cased_finetuned_ner_geocorpus_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_cased_finetuned_ner_geocorpus_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_finetuned_ner_geocorpus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|665.2 MB| + +## References + +https://huggingface.co/GuiTap/bert-base-multilingual-cased-finetuned-ner-geocorpus + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_cased_wnli_1_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_cased_wnli_1_pipeline_xx.md new file mode 100644 index 00000000000000..a00f334e07226a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_cased_wnli_1_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_cased_wnli_1_pipeline pipeline BertForSequenceClassification from tmnam20 +author: John Snow Labs +name: bert_base_multilingual_cased_wnli_1_pipeline +date: 2024-09-25 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_cased_wnli_1_pipeline` is a Multilingual model originally trained by tmnam20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_wnli_1_pipeline_xx_5.5.0_3.0_1727285032268.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_cased_wnli_1_pipeline_xx_5.5.0_3.0_1727285032268.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_cased_wnli_1_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_cased_wnli_1_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_cased_wnli_1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|667.3 MB| + +## References + +https://huggingface.co/tmnam20/bert-base-multilingual-cased-wnli-1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_for_multilang_ner_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_for_multilang_ner_pipeline_xx.md new file mode 100644 index 00000000000000..338aadf28b0e3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_for_multilang_ner_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_finetuned_for_multilang_ner_pipeline pipeline BertForTokenClassification from Misha24-10 +author: John Snow Labs +name: bert_base_multilingual_uncased_finetuned_for_multilang_ner_pipeline +date: 2024-09-25 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_finetuned_for_multilang_ner_pipeline` is a Multilingual model originally trained by Misha24-10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_finetuned_for_multilang_ner_pipeline_xx_5.5.0_3.0_1727275975615.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_finetuned_for_multilang_ner_pipeline_xx_5.5.0_3.0_1727275975615.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_uncased_finetuned_for_multilang_ner_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_uncased_finetuned_for_multilang_ner_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_finetuned_for_multilang_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|625.8 MB| + +## References + +https://huggingface.co/Misha24-10/bert-base-multilingual-uncased-finetuned-for-multilang-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_for_multilang_ner_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_for_multilang_ner_xx.md new file mode 100644 index 00000000000000..de9764642e5fa2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_for_multilang_ner_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_finetuned_for_multilang_ner BertForTokenClassification from Misha24-10 +author: John Snow Labs +name: bert_base_multilingual_uncased_finetuned_for_multilang_ner +date: 2024-09-25 +tags: [xx, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_finetuned_for_multilang_ner` is a Multilingual model originally trained by Misha24-10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_finetuned_for_multilang_ner_xx_5.5.0_3.0_1727275943080.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_finetuned_for_multilang_ner_xx_5.5.0_3.0_1727275943080.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_multilingual_uncased_finetuned_for_multilang_ner","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_multilingual_uncased_finetuned_for_multilang_ner", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_finetuned_for_multilang_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|625.7 MB| + +## References + +https://huggingface.co/Misha24-10/bert-base-multilingual-uncased-finetuned-for-multilang-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_masress_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_masress_pipeline_xx.md new file mode 100644 index 00000000000000..8355eced5abfb0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_masress_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_finetuned_masress_pipeline pipeline BertForSequenceClassification from cjbarrie +author: John Snow Labs +name: bert_base_multilingual_uncased_finetuned_masress_pipeline +date: 2024-09-25 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_finetuned_masress_pipeline` is a Multilingual model originally trained by cjbarrie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_finetuned_masress_pipeline_xx_5.5.0_3.0_1727257527230.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_finetuned_masress_pipeline_xx_5.5.0_3.0_1727257527230.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_uncased_finetuned_masress_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_uncased_finetuned_masress_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_finetuned_masress_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|627.8 MB| + +## References + +https://huggingface.co/cjbarrie/bert-base-multilingual-uncased-finetuned-masress + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_masress_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_masress_xx.md new file mode 100644 index 00000000000000..2409bfd2ddd013 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_finetuned_masress_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_finetuned_masress BertForSequenceClassification from cjbarrie +author: John Snow Labs +name: bert_base_multilingual_uncased_finetuned_masress +date: 2024-09-25 +tags: [xx, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_finetuned_masress` is a Multilingual model originally trained by cjbarrie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_finetuned_masress_xx_5.5.0_3.0_1727257494505.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_finetuned_masress_xx_5.5.0_3.0_1727257494505.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_finetuned_masress","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_finetuned_masress", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_finetuned_masress| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|627.7 MB| + +## References + +https://huggingface.co/cjbarrie/bert-base-multilingual-uncased-finetuned-masress \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_ner_silvanus_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_ner_silvanus_pipeline_xx.md new file mode 100644 index 00000000000000..4799cd6723bec5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_ner_silvanus_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_ner_silvanus_pipeline pipeline BertForTokenClassification from rollerhafeezh-amikom +author: John Snow Labs +name: bert_base_multilingual_uncased_ner_silvanus_pipeline +date: 2024-09-25 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_ner_silvanus_pipeline` is a Multilingual model originally trained by rollerhafeezh-amikom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_ner_silvanus_pipeline_xx_5.5.0_3.0_1727247866149.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_ner_silvanus_pipeline_xx_5.5.0_3.0_1727247866149.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_uncased_ner_silvanus_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_uncased_ner_silvanus_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_ner_silvanus_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|625.6 MB| + +## References + +https://huggingface.co/rollerhafeezh-amikom/bert-base-multilingual-uncased-ner-silvanus + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_ner_silvanus_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_ner_silvanus_xx.md new file mode 100644 index 00000000000000..89d3ec1cdb3c91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_ner_silvanus_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_ner_silvanus BertForTokenClassification from rollerhafeezh-amikom +author: John Snow Labs +name: bert_base_multilingual_uncased_ner_silvanus +date: 2024-09-25 +tags: [xx, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_ner_silvanus` is a Multilingual model originally trained by rollerhafeezh-amikom. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_ner_silvanus_xx_5.5.0_3.0_1727247832780.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_ner_silvanus_xx_5.5.0_3.0_1727247832780.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_multilingual_uncased_ner_silvanus","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_multilingual_uncased_ner_silvanus", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_ner_silvanus| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|625.5 MB| + +## References + +https://huggingface.co/rollerhafeezh-amikom/bert-base-multilingual-uncased-ner-silvanus \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_pipeline_xx.md new file mode 100644 index 00000000000000..2d7fdeef57314f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_pipeline pipeline BertForSequenceClassification from beamandym +author: John Snow Labs +name: bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_pipeline +date: 2024-09-25 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_pipeline` is a Multilingual model originally trained by beamandym. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_pipeline_xx_5.5.0_3.0_1727237764695.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_pipeline_xx_5.5.0_3.0_1727237764695.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|627.8 MB| + +## References + +https://huggingface.co/beamandym/bert-base-multilingual-uncased-sentiment-finetuned-MeIA-AnalisisDeSentimientos + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_xx.md new file mode 100644 index 00000000000000..af3f194994e60c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym BertForSequenceClassification from beamandym +author: John Snow Labs +name: bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym +date: 2024-09-25 +tags: [xx, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym` is a Multilingual model originally trained by beamandym. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_xx_5.5.0_3.0_1727237732984.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym_xx_5.5.0_3.0_1727237732984.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_beamandym| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|627.8 MB| + +## References + +https://huggingface.co/beamandym/bert-base-multilingual-uncased-sentiment-finetuned-MeIA-AnalisisDeSentimientos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_pipeline_xx.md new file mode 100644 index 00000000000000..d286bb0c0b4488 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_pipeline pipeline BertForSequenceClassification from Jumartineze +author: John Snow Labs +name: bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_pipeline +date: 2024-09-25 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_pipeline` is a Multilingual model originally trained by Jumartineze. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_pipeline_xx_5.5.0_3.0_1727276569808.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_pipeline_xx_5.5.0_3.0_1727276569808.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|627.8 MB| + +## References + +https://huggingface.co/Jumartineze/bert-base-multilingual-uncased-sentiment-finetuned-MeIA-AnalisisDeSentimientos + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_xx.md new file mode 100644 index 00000000000000..6997ac25e6b9d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze BertForSequenceClassification from Jumartineze +author: John Snow Labs +name: bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze +date: 2024-09-25 +tags: [xx, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze` is a Multilingual model originally trained by Jumartineze. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_xx_5.5.0_3.0_1727276535539.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze_xx_5.5.0_3.0_1727276535539.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_sentiment_finetuned_meia_analisisdesentimientos_jumartineze| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|627.8 MB| + +## References + +https://huggingface.co/Jumartineze/bert-base-multilingual-uncased-sentiment-finetuned-MeIA-AnalisisDeSentimientos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_qqp_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_qqp_pipeline_xx.md new file mode 100644 index 00000000000000..a1f4b74181d443 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_qqp_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_sentiment_finetuned_qqp_pipeline pipeline BertForSequenceClassification from anuj55 +author: John Snow Labs +name: bert_base_multilingual_uncased_sentiment_finetuned_qqp_pipeline +date: 2024-09-25 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_sentiment_finetuned_qqp_pipeline` is a Multilingual model originally trained by anuj55. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_qqp_pipeline_xx_5.5.0_3.0_1727272900504.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_qqp_pipeline_xx_5.5.0_3.0_1727272900504.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_multilingual_uncased_sentiment_finetuned_qqp_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_multilingual_uncased_sentiment_finetuned_qqp_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_sentiment_finetuned_qqp_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|627.8 MB| + +## References + +https://huggingface.co/anuj55/bert-base-multilingual-uncased-sentiment-finetuned-qqp + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_qqp_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_qqp_xx.md new file mode 100644 index 00000000000000..14f7004cf652ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_sentiment_finetuned_qqp_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_sentiment_finetuned_qqp BertForSequenceClassification from anuj55 +author: John Snow Labs +name: bert_base_multilingual_uncased_sentiment_finetuned_qqp +date: 2024-09-25 +tags: [xx, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_sentiment_finetuned_qqp` is a Multilingual model originally trained by anuj55. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_qqp_xx_5.5.0_3.0_1727272865915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_sentiment_finetuned_qqp_xx_5.5.0_3.0_1727272865915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_sentiment_finetuned_qqp","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_sentiment_finetuned_qqp", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_sentiment_finetuned_qqp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|627.8 MB| + +## References + +https://huggingface.co/anuj55/bert-base-multilingual-uncased-sentiment-finetuned-qqp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_vaxxstance_spanish_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_vaxxstance_spanish_xx.md new file mode 100644 index 00000000000000..a6cd122b45b0b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_multilingual_uncased_vaxxstance_spanish_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_base_multilingual_uncased_vaxxstance_spanish BertForSequenceClassification from nouman-10 +author: John Snow Labs +name: bert_base_multilingual_uncased_vaxxstance_spanish +date: 2024-09-25 +tags: [xx, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_multilingual_uncased_vaxxstance_spanish` is a Multilingual model originally trained by nouman-10. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_vaxxstance_spanish_xx_5.5.0_3.0_1727277454316.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_multilingual_uncased_vaxxstance_spanish_xx_5.5.0_3.0_1727277454316.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_vaxxstance_spanish","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_multilingual_uncased_vaxxstance_spanish", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_multilingual_uncased_vaxxstance_spanish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|627.7 MB| + +## References + +https://huggingface.co/nouman-10/bert-base-multilingual-uncased_vaxxstance_spanish \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_nlp100_title_classification_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_nlp100_title_classification_en.md new file mode 100644 index 00000000000000..c66436fbec6a2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_nlp100_title_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_nlp100_title_classification BertForSequenceClassification from udaizin +author: John Snow Labs +name: bert_base_nlp100_title_classification +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_nlp100_title_classification` is a English model originally trained by udaizin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_nlp100_title_classification_en_5.5.0_3.0_1727268187429.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_nlp100_title_classification_en_5.5.0_3.0_1727268187429.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_nlp100_title_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_nlp100_title_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_nlp100_title_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/udaizin/bert-base-nlp100_title_classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_cased_assin_similarity_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_cased_assin_similarity_pipeline_pt.md new file mode 100644 index 00000000000000..67270861055020 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_cased_assin_similarity_pipeline_pt.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Portuguese bert_base_portuguese_cased_assin_similarity_pipeline pipeline BertForSequenceClassification from ruanchaves +author: John Snow Labs +name: bert_base_portuguese_cased_assin_similarity_pipeline +date: 2024-09-25 +tags: [pt, open_source, pipeline, onnx] +task: Text Classification +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_portuguese_cased_assin_similarity_pipeline` is a Portuguese model originally trained by ruanchaves. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_assin_similarity_pipeline_pt_5.5.0_3.0_1727267089381.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_assin_similarity_pipeline_pt_5.5.0_3.0_1727267089381.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_portuguese_cased_assin_similarity_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_portuguese_cased_assin_similarity_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_portuguese_cased_assin_similarity_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|408.2 MB| + +## References + +https://huggingface.co/ruanchaves/bert-base-portuguese-cased-assin-similarity + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_cased_assin_similarity_pt.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_cased_assin_similarity_pt.md new file mode 100644 index 00000000000000..750667876d3704 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_cased_assin_similarity_pt.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Portuguese bert_base_portuguese_cased_assin_similarity BertForSequenceClassification from ruanchaves +author: John Snow Labs +name: bert_base_portuguese_cased_assin_similarity +date: 2024-09-25 +tags: [pt, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_portuguese_cased_assin_similarity` is a Portuguese model originally trained by ruanchaves. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_assin_similarity_pt_5.5.0_3.0_1727267066594.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_assin_similarity_pt_5.5.0_3.0_1727267066594.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_portuguese_cased_assin_similarity","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_portuguese_cased_assin_similarity", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_portuguese_cased_assin_similarity| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|pt| +|Size:|408.2 MB| + +## References + +https://huggingface.co/ruanchaves/bert-base-portuguese-cased-assin-similarity \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_cased_porsimplessent_pt.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_cased_porsimplessent_pt.md new file mode 100644 index 00000000000000..d261a8f154f6b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_cased_porsimplessent_pt.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Portuguese bert_base_portuguese_cased_porsimplessent BertForSequenceClassification from ruanchaves +author: John Snow Labs +name: bert_base_portuguese_cased_porsimplessent +date: 2024-09-25 +tags: [pt, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_portuguese_cased_porsimplessent` is a Portuguese model originally trained by ruanchaves. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_porsimplessent_pt_5.5.0_3.0_1727253718165.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_cased_porsimplessent_pt_5.5.0_3.0_1727253718165.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_portuguese_cased_porsimplessent","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_portuguese_cased_porsimplessent", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_portuguese_cased_porsimplessent| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|pt| +|Size:|408.2 MB| + +## References + +https://huggingface.co/ruanchaves/bert-base-portuguese-cased-porsimplessent \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_fine_tuned_mrpc_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_fine_tuned_mrpc_en.md new file mode 100644 index 00000000000000..35c10ed6246100 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_portuguese_fine_tuned_mrpc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_portuguese_fine_tuned_mrpc BertForSequenceClassification from erickrribeiro +author: John Snow Labs +name: bert_base_portuguese_fine_tuned_mrpc +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_portuguese_fine_tuned_mrpc` is a English model originally trained by erickrribeiro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_fine_tuned_mrpc_en_5.5.0_3.0_1727273584144.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_portuguese_fine_tuned_mrpc_en_5.5.0_3.0_1727273584144.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_portuguese_fine_tuned_mrpc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_portuguese_fine_tuned_mrpc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_portuguese_fine_tuned_mrpc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/erickrribeiro/bert-base-portuguese-fine-tuned-mrpc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_sayula_popoluca_theseus_bulgarian_bg.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_sayula_popoluca_theseus_bulgarian_bg.md new file mode 100644 index 00000000000000..0872b08e8ee471 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_sayula_popoluca_theseus_bulgarian_bg.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Bulgarian bert_base_sayula_popoluca_theseus_bulgarian BertForTokenClassification from rmihaylov +author: John Snow Labs +name: bert_base_sayula_popoluca_theseus_bulgarian +date: 2024-09-25 +tags: [bg, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: bg +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_sayula_popoluca_theseus_bulgarian` is a Bulgarian model originally trained by rmihaylov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_sayula_popoluca_theseus_bulgarian_bg_5.5.0_3.0_1727274959042.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_sayula_popoluca_theseus_bulgarian_bg_5.5.0_3.0_1727274959042.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_sayula_popoluca_theseus_bulgarian","bg") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_sayula_popoluca_theseus_bulgarian", "bg") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_sayula_popoluca_theseus_bulgarian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|bg| +|Size:|505.5 MB| + +## References + +https://huggingface.co/rmihaylov/bert-base-pos-theseus-bg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_meddocan_es.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_meddocan_es.md new file mode 100644 index 00000000000000..3292d3d84ea096 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_meddocan_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bert_base_spanish_wwm_cased_meddocan BertForTokenClassification from IIC +author: John Snow Labs +name: bert_base_spanish_wwm_cased_meddocan +date: 2024-09-25 +tags: [es, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_cased_meddocan` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_meddocan_es_5.5.0_3.0_1727265268046.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_meddocan_es_5.5.0_3.0_1727265268046.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_spanish_wwm_cased_meddocan","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_spanish_wwm_cased_meddocan", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_cased_meddocan| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|409.6 MB| + +## References + +https://huggingface.co/IIC/bert-base-spanish-wwm-cased-meddocan \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_meddocan_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_meddocan_pipeline_es.md new file mode 100644 index 00000000000000..18582e58dba1b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_meddocan_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bert_base_spanish_wwm_cased_meddocan_pipeline pipeline BertForTokenClassification from IIC +author: John Snow Labs +name: bert_base_spanish_wwm_cased_meddocan_pipeline +date: 2024-09-25 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_cased_meddocan_pipeline` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_meddocan_pipeline_es_5.5.0_3.0_1727265289796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_meddocan_pipeline_es_5.5.0_3.0_1727265289796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_spanish_wwm_cased_meddocan_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_spanish_wwm_cased_meddocan_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_cased_meddocan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|409.6 MB| + +## References + +https://huggingface.co/IIC/bert-base-spanish-wwm-cased-meddocan + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_socialdisner_es.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_socialdisner_es.md new file mode 100644 index 00000000000000..91e0e311e197e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_socialdisner_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish bert_base_spanish_wwm_cased_socialdisner BertForTokenClassification from IIC +author: John Snow Labs +name: bert_base_spanish_wwm_cased_socialdisner +date: 2024-09-25 +tags: [es, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_cased_socialdisner` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_socialdisner_es_5.5.0_3.0_1727284156369.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_socialdisner_es_5.5.0_3.0_1727284156369.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_spanish_wwm_cased_socialdisner","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_spanish_wwm_cased_socialdisner", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_cased_socialdisner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|409.5 MB| + +## References + +https://huggingface.co/IIC/bert-base-spanish-wwm-cased-socialdisner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_socialdisner_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_socialdisner_pipeline_es.md new file mode 100644 index 00000000000000..f5943e1434c66b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_cased_socialdisner_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish bert_base_spanish_wwm_cased_socialdisner_pipeline pipeline BertForTokenClassification from IIC +author: John Snow Labs +name: bert_base_spanish_wwm_cased_socialdisner_pipeline +date: 2024-09-25 +tags: [es, open_source, pipeline, onnx] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_cased_socialdisner_pipeline` is a Castilian, Spanish model originally trained by IIC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_socialdisner_pipeline_es_5.5.0_3.0_1727284181712.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_cased_socialdisner_pipeline_es_5.5.0_3.0_1727284181712.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_spanish_wwm_cased_socialdisner_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_spanish_wwm_cased_socialdisner_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_cased_socialdisner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|409.5 MB| + +## References + +https://huggingface.co/IIC/bert-base-spanish-wwm-cased-socialdisner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_uncased_finetuned_sayula_popoluca_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_uncased_finetuned_sayula_popoluca_pipeline_en.md new file mode 100644 index 00000000000000..ac72248b6c5bbc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_spanish_wwm_uncased_finetuned_sayula_popoluca_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_spanish_wwm_uncased_finetuned_sayula_popoluca_pipeline pipeline BertForTokenClassification from dccuchile +author: John Snow Labs +name: bert_base_spanish_wwm_uncased_finetuned_sayula_popoluca_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_spanish_wwm_uncased_finetuned_sayula_popoluca_pipeline` is a English model originally trained by dccuchile. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_uncased_finetuned_sayula_popoluca_pipeline_en_5.5.0_3.0_1727271585281.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_spanish_wwm_uncased_finetuned_sayula_popoluca_pipeline_en_5.5.0_3.0_1727271585281.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_spanish_wwm_uncased_finetuned_sayula_popoluca_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_spanish_wwm_uncased_finetuned_sayula_popoluca_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_spanish_wwm_uncased_finetuned_sayula_popoluca_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.7 MB| + +## References + +https://huggingface.co/dccuchile/bert-base-spanish-wwm-uncased-finetuned-pos + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_sst_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_sst_pipeline_en.md new file mode 100644 index 00000000000000..ca1855bbc9e289 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_sst_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_sst_pipeline pipeline BertForSequenceClassification from hugmanskj +author: John Snow Labs +name: bert_base_sst_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_sst_pipeline` is a English model originally trained by hugmanskj. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_sst_pipeline_en_5.5.0_3.0_1727286638169.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_sst_pipeline_en_5.5.0_3.0_1727286638169.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_sst_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_sst_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_sst_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/hugmanskj/bert-base-sst + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_temp_classifier_boot_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_temp_classifier_boot_pipeline_en.md new file mode 100644 index 00000000000000..318d08e3b1b823 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_temp_classifier_boot_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_temp_classifier_boot_pipeline pipeline BertForSequenceClassification from research-dump +author: John Snow Labs +name: bert_base_temp_classifier_boot_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_temp_classifier_boot_pipeline` is a English model originally trained by research-dump. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_temp_classifier_boot_pipeline_en_5.5.0_3.0_1727288190931.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_temp_classifier_boot_pipeline_en_5.5.0_3.0_1727288190931.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_temp_classifier_boot_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_temp_classifier_boot_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_temp_classifier_boot_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/research-dump/bert_base_temp_classifier_boot + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_theseus_bulgarian_bg.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_theseus_bulgarian_bg.md new file mode 100644 index 00000000000000..d45147a67b9d44 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_theseus_bulgarian_bg.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Bulgarian bert_base_theseus_bulgarian BertEmbeddings from rmihaylov +author: John Snow Labs +name: bert_base_theseus_bulgarian +date: 2024-09-25 +tags: [bg, open_source, onnx, embeddings, bert] +task: Embeddings +language: bg +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_theseus_bulgarian` is a Bulgarian model originally trained by rmihaylov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_theseus_bulgarian_bg_5.5.0_3.0_1727258333284.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_theseus_bulgarian_bg_5.5.0_3.0_1727258333284.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_theseus_bulgarian","bg") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_theseus_bulgarian","bg") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_theseus_bulgarian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|bg| +|Size:|505.4 MB| + +## References + +https://huggingface.co/rmihaylov/bert-base-theseus-bg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_theseus_bulgarian_pipeline_bg.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_theseus_bulgarian_pipeline_bg.md new file mode 100644 index 00000000000000..63f7a4269191ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_theseus_bulgarian_pipeline_bg.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Bulgarian bert_base_theseus_bulgarian_pipeline pipeline BertEmbeddings from rmihaylov +author: John Snow Labs +name: bert_base_theseus_bulgarian_pipeline +date: 2024-09-25 +tags: [bg, open_source, pipeline, onnx] +task: Embeddings +language: bg +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_theseus_bulgarian_pipeline` is a Bulgarian model originally trained by rmihaylov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_theseus_bulgarian_pipeline_bg_5.5.0_3.0_1727258359737.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_theseus_bulgarian_pipeline_bg_5.5.0_3.0_1727258359737.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_theseus_bulgarian_pipeline", lang = "bg") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_theseus_bulgarian_pipeline", lang = "bg") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_theseus_bulgarian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|bg| +|Size:|505.4 MB| + +## References + +https://huggingface.co/rmihaylov/bert-base-theseus-bg + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_turkish_cased_finetuned_ner_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_turkish_cased_finetuned_ner_en.md new file mode 100644 index 00000000000000..aee3808816a42f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_turkish_cased_finetuned_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_turkish_cased_finetuned_ner BertForTokenClassification from ugrozkr +author: John Snow Labs +name: bert_base_turkish_cased_finetuned_ner +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_turkish_cased_finetuned_ner` is a English model originally trained by ugrozkr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_turkish_cased_finetuned_ner_en_5.5.0_3.0_1727262491540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_turkish_cased_finetuned_ner_en_5.5.0_3.0_1727262491540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_turkish_cased_finetuned_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_turkish_cased_finetuned_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_turkish_cased_finetuned_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|412.3 MB| + +## References + +https://huggingface.co/ugrozkr/bert-base-turkish-cased-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_turkish_cased_finetuned_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_turkish_cased_finetuned_ner_pipeline_en.md new file mode 100644 index 00000000000000..e15bd2dcc387f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_turkish_cased_finetuned_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_turkish_cased_finetuned_ner_pipeline pipeline BertForTokenClassification from ugrozkr +author: John Snow Labs +name: bert_base_turkish_cased_finetuned_ner_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_turkish_cased_finetuned_ner_pipeline` is a English model originally trained by ugrozkr. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_turkish_cased_finetuned_ner_pipeline_en_5.5.0_3.0_1727262513423.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_turkish_cased_finetuned_ner_pipeline_en_5.5.0_3.0_1727262513423.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_turkish_cased_finetuned_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_turkish_cased_finetuned_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_turkish_cased_finetuned_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.3 MB| + +## References + +https://huggingface.co/ugrozkr/bert-base-turkish-cased-finetuned-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_tweetner7_2020_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_tweetner7_2020_pipeline_en.md new file mode 100644 index 00000000000000..541035ba2e944e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_tweetner7_2020_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_tweetner7_2020_pipeline pipeline BertForTokenClassification from tner +author: John Snow Labs +name: bert_base_tweetner7_2020_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_tweetner7_2020_pipeline` is a English model originally trained by tner. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_tweetner7_2020_pipeline_en_5.5.0_3.0_1727264846427.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_tweetner7_2020_pipeline_en_5.5.0_3.0_1727264846427.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_tweetner7_2020_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_tweetner7_2020_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_tweetner7_2020_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/tner/bert-base-tweetner7-2020 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_en.md new file mode 100644 index 00000000000000..fb9e3b4ff22450 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_1802 BertEmbeddings from JamesKim +author: John Snow Labs +name: bert_base_uncased_1802 +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_1802` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_1802_en_5.5.0_3.0_1727256384007.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_1802_en_5.5.0_3.0_1727256384007.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_1802","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_1802","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_1802| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/JamesKim/bert-base-uncased_1802 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_pipeline_en.md new file mode 100644 index 00000000000000..9a85a1e8184bb8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_1802_pipeline pipeline BertEmbeddings from JamesKim +author: John Snow Labs +name: bert_base_uncased_1802_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_1802_pipeline` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_1802_pipeline_en_5.5.0_3.0_1727256405375.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_1802_pipeline_en_5.5.0_3.0_1727256405375.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_1802_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_1802_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_1802_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/JamesKim/bert-base-uncased_1802 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_r2_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_r2_en.md new file mode 100644 index 00000000000000..a2ee3d6f34a07c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_r2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_1802_r2 BertEmbeddings from JamesKim +author: John Snow Labs +name: bert_base_uncased_1802_r2 +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_1802_r2` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_1802_r2_en_5.5.0_3.0_1727236503350.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_1802_r2_en_5.5.0_3.0_1727236503350.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_1802_r2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_1802_r2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_1802_r2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/JamesKim/bert-base-uncased_1802_r2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_r2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_r2_pipeline_en.md new file mode 100644 index 00000000000000..68cd6277731f39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_1802_r2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_1802_r2_pipeline pipeline BertEmbeddings from JamesKim +author: John Snow Labs +name: bert_base_uncased_1802_r2_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_1802_r2_pipeline` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_1802_r2_pipeline_en_5.5.0_3.0_1727236525159.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_1802_r2_pipeline_en_5.5.0_3.0_1727236525159.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_1802_r2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_1802_r2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_1802_r2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/JamesKim/bert-base-uncased_1802_r2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_8_50_0_01_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_8_50_0_01_en.md new file mode 100644 index 00000000000000..0b6436310aa5e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_8_50_0_01_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_8_50_0_01 BertForSequenceClassification from daisyxie21 +author: John Snow Labs +name: bert_base_uncased_8_50_0_01 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_8_50_0_01` is a English model originally trained by daisyxie21. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_8_50_0_01_en_5.5.0_3.0_1727276675095.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_8_50_0_01_en_5.5.0_3.0_1727276675095.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_8_50_0_01","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_8_50_0_01", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_8_50_0_01| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|406.4 MB| + +## References + +https://huggingface.co/daisyxie21/bert-base-uncased-8-50-0.01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_ad_nonad_classifer_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_ad_nonad_classifer_en.md new file mode 100644 index 00000000000000..9d22e9ec8a3072 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_ad_nonad_classifer_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_ad_nonad_classifer BertForSequenceClassification from Kaleemullah +author: John Snow Labs +name: bert_base_uncased_ad_nonad_classifer +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ad_nonad_classifer` is a English model originally trained by Kaleemullah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ad_nonad_classifer_en_5.5.0_3.0_1727285254090.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ad_nonad_classifer_en_5.5.0_3.0_1727285254090.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_ad_nonad_classifer","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_ad_nonad_classifer", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ad_nonad_classifer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Kaleemullah/bert-base-uncased-ad-nonad-classifer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_airlines_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_airlines_en.md new file mode 100644 index 00000000000000..2883afbfc6ca03 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_airlines_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_airlines BertForSequenceClassification from tasosk +author: John Snow Labs +name: bert_base_uncased_airlines +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_airlines` is a English model originally trained by tasosk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_airlines_en_5.5.0_3.0_1727268558796.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_airlines_en_5.5.0_3.0_1727268558796.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_airlines","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_airlines", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_airlines| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/tasosk/bert-base-uncased-airlines \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_airlines_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_airlines_pipeline_en.md new file mode 100644 index 00000000000000..3007c725f9a075 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_airlines_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_airlines_pipeline pipeline BertForSequenceClassification from tasosk +author: John Snow Labs +name: bert_base_uncased_airlines_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_airlines_pipeline` is a English model originally trained by tasosk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_airlines_pipeline_en_5.5.0_3.0_1727268581178.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_airlines_pipeline_en_5.5.0_3.0_1727268581178.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_airlines_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_airlines_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_airlines_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/tasosk/bert-base-uncased-airlines + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_en.md new file mode 100644 index 00000000000000..3623d7eb6391c6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_alerts04142023_rsplit_2000_category1_severity BertForSequenceClassification from slewis +author: John Snow Labs +name: bert_base_uncased_alerts04142023_rsplit_2000_category1_severity +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_alerts04142023_rsplit_2000_category1_severity` is a English model originally trained by slewis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_en_5.5.0_3.0_1727287688543.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_en_5.5.0_3.0_1727287688543.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_alerts04142023_rsplit_2000_category1_severity","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_alerts04142023_rsplit_2000_category1_severity", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_alerts04142023_rsplit_2000_category1_severity| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/slewis/bert-base-uncased_alerts04142023_rsplit_2000_Category1_Severity \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_pipeline_en.md new file mode 100644 index 00000000000000..73812e4fcdb964 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_pipeline pipeline BertForSequenceClassification from slewis +author: John Snow Labs +name: bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_pipeline` is a English model originally trained by slewis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_pipeline_en_5.5.0_3.0_1727287709665.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_pipeline_en_5.5.0_3.0_1727287709665.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_alerts04142023_rsplit_2000_category1_severity_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/slewis/bert-base-uncased_alerts04142023_rsplit_2000_Category1_Severity + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_cola_int8_indic_languages_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_cola_int8_indic_languages_en.md new file mode 100644 index 00000000000000..377818af554ad7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_cola_int8_indic_languages_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_cola_int8_indic_languages BertForSequenceClassification from Intel +author: John Snow Labs +name: bert_base_uncased_cola_int8_indic_languages +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_cola_int8_indic_languages` is a English model originally trained by Intel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_cola_int8_indic_languages_en_5.5.0_3.0_1727269236157.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_cola_int8_indic_languages_en_5.5.0_3.0_1727269236157.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_cola_int8_indic_languages","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_cola_int8_indic_languages", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_cola_int8_indic_languages| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.0 MB| + +## References + +https://huggingface.co/Intel/bert-base-uncased-CoLA-int8-inc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_crows_pairs_classifieronly_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_crows_pairs_classifieronly_en.md new file mode 100644 index 00000000000000..c0b08a9b1872d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_crows_pairs_classifieronly_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_crows_pairs_classifieronly BertForSequenceClassification from asun17904 +author: John Snow Labs +name: bert_base_uncased_crows_pairs_classifieronly +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_crows_pairs_classifieronly` is a English model originally trained by asun17904. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_crows_pairs_classifieronly_en_5.5.0_3.0_1727279543934.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_crows_pairs_classifieronly_en_5.5.0_3.0_1727279543934.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_crows_pairs_classifieronly","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_crows_pairs_classifieronly", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_crows_pairs_classifieronly| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/asun17904/bert-base-uncased_crows_pairs_classifieronly \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_dstc10_kb_title_body_validate_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_dstc10_kb_title_body_validate_pipeline_en.md new file mode 100644 index 00000000000000..9cd68482ba1f2f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_dstc10_kb_title_body_validate_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_dstc10_kb_title_body_validate_pipeline pipeline BertForSequenceClassification from wilsontam +author: John Snow Labs +name: bert_base_uncased_dstc10_kb_title_body_validate_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_dstc10_kb_title_body_validate_pipeline` is a English model originally trained by wilsontam. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_dstc10_kb_title_body_validate_pipeline_en_5.5.0_3.0_1727288135713.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_dstc10_kb_title_body_validate_pipeline_en_5.5.0_3.0_1727288135713.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_dstc10_kb_title_body_validate_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_dstc10_kb_title_body_validate_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_dstc10_kb_title_body_validate_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/wilsontam/bert-base-uncased-dstc10-kb-title-body-validate + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_e_care_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_e_care_en.md new file mode 100644 index 00000000000000..79155c80275a1e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_e_care_en.md @@ -0,0 +1,86 @@ +--- +layout: model +title: English bert_base_uncased_e_care BertForQuestionAnswering from DunnBC22 +author: John Snow Labs +name: bert_base_uncased_e_care +date: 2024-09-25 +tags: [en, open_source, onnx, question_answering, bert] +task: Question Answering +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForQuestionAnswering +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForQuestionAnswering model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_e_care` is a English model originally trained by DunnBC22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_e_care_en_5.5.0_3.0_1727239194784.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_e_care_en_5.5.0_3.0_1727239194784.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = MultiDocumentAssembler() \ + .setInputCol(["question", "context"]) \ + .setOutputCol(["document_question", "document_context"]) + +spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_e_care","en") \ + .setInputCols(["document_question","document_context"]) \ + .setOutputCol("answer") + +pipeline = Pipeline().setStages([documentAssembler, spanClassifier]) +data = spark.createDataFrame([["What framework do I use?","I use spark-nlp."]]).toDF("document_question", "document_context") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new MultiDocumentAssembler() + .setInputCol(Array("question", "context")) + .setOutputCol(Array("document_question", "document_context")) + +val spanClassifier = BertForQuestionAnswering.pretrained("bert_base_uncased_e_care", "en") + .setInputCols(Array("document_question","document_context")) + .setOutputCol("answer") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, spanClassifier)) +val data = Seq("What framework do I use?","I use spark-nlp.").toDS.toDF("document_question", "document_context") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_e_care| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document_question, document_context]| +|Output Labels:|[answer]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/DunnBC22/bert-base-uncased-e_CARE \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_ear_mlma_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_ear_mlma_en.md new file mode 100644 index 00000000000000..1edb8a10f0cb36 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_ear_mlma_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_ear_mlma BertForSequenceClassification from MilaNLProc +author: John Snow Labs +name: bert_base_uncased_ear_mlma +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_ear_mlma` is a English model originally trained by MilaNLProc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ear_mlma_en_5.5.0_3.0_1727263485878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_ear_mlma_en_5.5.0_3.0_1727263485878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_ear_mlma","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_ear_mlma", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_ear_mlma| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/MilaNLProc/bert-base-uncased-ear-mlma \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_emotion_ft_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_emotion_ft_en.md new file mode 100644 index 00000000000000..bc0b92d05fcdd5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_emotion_ft_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_emotion_ft BertForSequenceClassification from colingao +author: John Snow Labs +name: bert_base_uncased_emotion_ft +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_emotion_ft` is a English model originally trained by colingao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_emotion_ft_en_5.5.0_3.0_1727276584617.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_emotion_ft_en_5.5.0_3.0_1727276584617.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_emotion_ft","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_emotion_ft", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_emotion_ft| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/colingao/bert-base-uncased_emotion_ft \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_fine_tuned_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_fine_tuned_imdb_en.md new file mode 100644 index 00000000000000..31b84c43af0edc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_fine_tuned_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_fine_tuned_imdb BertForSequenceClassification from shre-db +author: John Snow Labs +name: bert_base_uncased_fine_tuned_imdb +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_fine_tuned_imdb` is a English model originally trained by shre-db. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_fine_tuned_imdb_en_5.5.0_3.0_1727261528486.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_fine_tuned_imdb_en_5.5.0_3.0_1727261528486.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_fine_tuned_imdb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_fine_tuned_imdb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_fine_tuned_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/shre-db/Bert-Base-Uncased-Fine-Tuned-IMDB \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned2_cola_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned2_cola_en.md new file mode 100644 index 00000000000000..9c81729ddb62a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned2_cola_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned2_cola BertForSequenceClassification from ilkekas +author: John Snow Labs +name: bert_base_uncased_finetuned2_cola +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned2_cola` is a English model originally trained by ilkekas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned2_cola_en_5.5.0_3.0_1727267764170.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned2_cola_en_5.5.0_3.0_1727267764170.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned2_cola","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned2_cola", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned2_cola| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ilkekas/bert-base-uncased-finetuned2-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned2_cola_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned2_cola_pipeline_en.md new file mode 100644 index 00000000000000..df5f097b916f9a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned2_cola_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned2_cola_pipeline pipeline BertForSequenceClassification from ilkekas +author: John Snow Labs +name: bert_base_uncased_finetuned2_cola_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned2_cola_pipeline` is a English model originally trained by ilkekas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned2_cola_pipeline_en_5.5.0_3.0_1727267786213.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned2_cola_pipeline_en_5.5.0_3.0_1727267786213.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned2_cola_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned2_cola_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned2_cola_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ilkekas/bert-base-uncased-finetuned2-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_amazon_reviews_multi_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_amazon_reviews_multi_en.md new file mode 100644 index 00000000000000..63430e8195d397 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_amazon_reviews_multi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_amazon_reviews_multi BertForSequenceClassification from JoelVIU +author: John Snow Labs +name: bert_base_uncased_finetuned_amazon_reviews_multi +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_amazon_reviews_multi` is a English model originally trained by JoelVIU. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_amazon_reviews_multi_en_5.5.0_3.0_1727286280588.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_amazon_reviews_multi_en_5.5.0_3.0_1727286280588.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_amazon_reviews_multi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_amazon_reviews_multi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_amazon_reviews_multi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/JoelVIU/bert-base-uncased-finetuned-amazon_reviews_multi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cda_gender_neutral_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cda_gender_neutral_en.md new file mode 100644 index 00000000000000..3424bc4280c90a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cda_gender_neutral_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_cda_gender_neutral BertEmbeddings from zz990906 +author: John Snow Labs +name: bert_base_uncased_finetuned_cda_gender_neutral +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_cda_gender_neutral` is a English model originally trained by zz990906. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cda_gender_neutral_en_5.5.0_3.0_1727232569580.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cda_gender_neutral_en_5.5.0_3.0_1727232569580.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_finetuned_cda_gender_neutral","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_finetuned_cda_gender_neutral","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_cda_gender_neutral| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/zz990906/bert-base-uncased-finetuned-cda-gender-neutral \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_avb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_avb_pipeline_en.md new file mode 100644 index 00000000000000..8a174907cdd17f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_avb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_cola_avb_pipeline pipeline BertForSequenceClassification from avb +author: John Snow Labs +name: bert_base_uncased_finetuned_cola_avb_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_cola_avb_pipeline` is a English model originally trained by avb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cola_avb_pipeline_en_5.5.0_3.0_1727268577092.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cola_avb_pipeline_en_5.5.0_3.0_1727268577092.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_cola_avb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_cola_avb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_cola_avb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/avb/bert-base-uncased-finetuned-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_kaanha_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_kaanha_en.md new file mode 100644 index 00000000000000..17351122f7dcd2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_kaanha_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_cola_kaanha BertForSequenceClassification from KaanHa +author: John Snow Labs +name: bert_base_uncased_finetuned_cola_kaanha +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_cola_kaanha` is a English model originally trained by KaanHa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cola_kaanha_en_5.5.0_3.0_1727287133479.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cola_kaanha_en_5.5.0_3.0_1727287133479.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_cola_kaanha","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_cola_kaanha", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_cola_kaanha| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/KaanHa/bert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_learning_rate_2e_05_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_learning_rate_2e_05_en.md new file mode 100644 index 00000000000000..987031b7184264 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_learning_rate_2e_05_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_cola_learning_rate_2e_05 BertForSequenceClassification from cansurav +author: John Snow Labs +name: bert_base_uncased_finetuned_cola_learning_rate_2e_05 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_cola_learning_rate_2e_05` is a English model originally trained by cansurav. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cola_learning_rate_2e_05_en_5.5.0_3.0_1727286389430.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cola_learning_rate_2e_05_en_5.5.0_3.0_1727286389430.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_cola_learning_rate_2e_05","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_cola_learning_rate_2e_05", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_cola_learning_rate_2e_05| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/cansurav/bert-base-uncased-finetuned-cola-learning_rate-2e-05 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_sepehrbakhshi_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_sepehrbakhshi_en.md new file mode 100644 index 00000000000000..8b02c39d0e8f81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_cola_sepehrbakhshi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_cola_sepehrbakhshi BertForSequenceClassification from sepehrbakhshi +author: John Snow Labs +name: bert_base_uncased_finetuned_cola_sepehrbakhshi +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_cola_sepehrbakhshi` is a English model originally trained by sepehrbakhshi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cola_sepehrbakhshi_en_5.5.0_3.0_1727288324819.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_cola_sepehrbakhshi_en_5.5.0_3.0_1727288324819.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_cola_sepehrbakhshi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_cola_sepehrbakhshi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_cola_sepehrbakhshi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/sepehrbakhshi/bert-base-uncased-finetuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_depression_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_depression_pipeline_en.md new file mode 100644 index 00000000000000..da445793d0c331 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_depression_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_depression_pipeline pipeline BertForSequenceClassification from welsachy +author: John Snow Labs +name: bert_base_uncased_finetuned_depression_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_depression_pipeline` is a English model originally trained by welsachy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_depression_pipeline_en_5.5.0_3.0_1727276762936.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_depression_pipeline_en_5.5.0_3.0_1727276762936.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_depression_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_depression_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_depression_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/welsachy/bert-base-uncased-finetuned-depression + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_detests_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_detests_en.md new file mode 100644 index 00000000000000..8941177e001a71 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_detests_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_detests BertForSequenceClassification from Pablo94 +author: John Snow Labs +name: bert_base_uncased_finetuned_detests +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_detests` is a English model originally trained by Pablo94. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_detests_en_5.5.0_3.0_1727268304494.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_detests_en_5.5.0_3.0_1727268304494.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_detests","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_detests", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_detests| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Pablo94/bert-base-uncased-finetuned-detests \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_detests_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_detests_pipeline_en.md new file mode 100644 index 00000000000000..ae52dad3402a76 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_detests_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_detests_pipeline pipeline BertForSequenceClassification from Pablo94 +author: John Snow Labs +name: bert_base_uncased_finetuned_detests_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_detests_pipeline` is a English model originally trained by Pablo94. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_detests_pipeline_en_5.5.0_3.0_1727268327256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_detests_pipeline_en_5.5.0_3.0_1727268327256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_detests_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_detests_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_detests_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Pablo94/bert-base-uncased-finetuned-detests + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_imdb_rman_rahimi_29_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_imdb_rman_rahimi_29_en.md new file mode 100644 index 00000000000000..5441edf05f0db1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_imdb_rman_rahimi_29_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_imdb_rman_rahimi_29 BertEmbeddings from rman-rahimi-29 +author: John Snow Labs +name: bert_base_uncased_finetuned_imdb_rman_rahimi_29 +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_imdb_rman_rahimi_29` is a English model originally trained by rman-rahimi-29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_imdb_rman_rahimi_29_en_5.5.0_3.0_1727240722847.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_imdb_rman_rahimi_29_en_5.5.0_3.0_1727240722847.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_finetuned_imdb_rman_rahimi_29","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_finetuned_imdb_rman_rahimi_29","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_imdb_rman_rahimi_29| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/rman-rahimi-29/bert-base-uncased-finetuned-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_learningrate_2_cola_4e_05_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_learningrate_2_cola_4e_05_pipeline_en.md new file mode 100644 index 00000000000000..06b02b517aa49c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_learningrate_2_cola_4e_05_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_learningrate_2_cola_4e_05_pipeline pipeline BertForSequenceClassification from yagmurery +author: John Snow Labs +name: bert_base_uncased_finetuned_learningrate_2_cola_4e_05_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_learningrate_2_cola_4e_05_pipeline` is a English model originally trained by yagmurery. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_learningrate_2_cola_4e_05_pipeline_en_5.5.0_3.0_1727273487652.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_learningrate_2_cola_4e_05_pipeline_en_5.5.0_3.0_1727273487652.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_learningrate_2_cola_4e_05_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_learningrate_2_cola_4e_05_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_learningrate_2_cola_4e_05_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yagmurery/bert-base-uncased-finetuned-learningRate-2-cola-4e-05 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_en.md new file mode 100644 index 00000000000000..75cb040e6d3b95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_mnli_max_length_256_epoch_5 BertForSequenceClassification from yy642 +author: John Snow Labs +name: bert_base_uncased_finetuned_mnli_max_length_256_epoch_5 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_mnli_max_length_256_epoch_5` is a English model originally trained by yy642. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_en_5.5.0_3.0_1727278395887.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_en_5.5.0_3.0_1727278395887.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_mnli_max_length_256_epoch_5","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_mnli_max_length_256_epoch_5", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_mnli_max_length_256_epoch_5| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yy642/bert-base-uncased-finetuned-mnli-max-length-256-epoch-5 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_pipeline_en.md new file mode 100644 index 00000000000000..260f5a0b35ddac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_pipeline pipeline BertForSequenceClassification from yy642 +author: John Snow Labs +name: bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_pipeline` is a English model originally trained by yy642. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_pipeline_en_5.5.0_3.0_1727278417026.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_pipeline_en_5.5.0_3.0_1727278417026.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_mnli_max_length_256_epoch_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yy642/bert-base-uncased-finetuned-mnli-max-length-256-epoch-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_mnli_rte_wnli_3_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_mnli_rte_wnli_3_en.md new file mode 100644 index 00000000000000..4991135ca6741a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_mnli_rte_wnli_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_mnli_rte_wnli_3 BertForSequenceClassification from yy642 +author: John Snow Labs +name: bert_base_uncased_finetuned_mnli_rte_wnli_3 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_mnli_rte_wnli_3` is a English model originally trained by yy642. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_mnli_rte_wnli_3_en_5.5.0_3.0_1727273560436.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_mnli_rte_wnli_3_en_5.5.0_3.0_1727273560436.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_mnli_rte_wnli_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_mnli_rte_wnli_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_mnli_rte_wnli_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yy642/bert-base-uncased-finetuned-mnli-rte-wnli-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_news_1929_1932_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_news_1929_1932_pipeline_en.md new file mode 100644 index 00000000000000..bf76a41e02352d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_news_1929_1932_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_news_1929_1932_pipeline pipeline BertEmbeddings from sally9805 +author: John Snow Labs +name: bert_base_uncased_finetuned_news_1929_1932_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_news_1929_1932_pipeline` is a English model originally trained by sally9805. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_news_1929_1932_pipeline_en_5.5.0_3.0_1727254945892.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_news_1929_1932_pipeline_en_5.5.0_3.0_1727254945892.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_news_1929_1932_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_news_1929_1932_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_news_1929_1932_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sally9805/bert-base-uncased-finetuned-news-1929-1932 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_poli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_poli_pipeline_en.md new file mode 100644 index 00000000000000..7758e327705ddd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_poli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_poli_pipeline pipeline BertForSequenceClassification from lmajer +author: John Snow Labs +name: bert_base_uncased_finetuned_poli_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_poli_pipeline` is a English model originally trained by lmajer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_poli_pipeline_en_5.5.0_3.0_1727284948735.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_poli_pipeline_en_5.5.0_3.0_1727284948735.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_poli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_poli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_poli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/lmajer/bert-base-uncased-finetuned-POLI + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_rte_max_length_512_epoch_10_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_rte_max_length_512_epoch_10_en.md new file mode 100644 index 00000000000000..5c4737ab074f66 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_rte_max_length_512_epoch_10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_rte_max_length_512_epoch_10 BertForSequenceClassification from yy642 +author: John Snow Labs +name: bert_base_uncased_finetuned_rte_max_length_512_epoch_10 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_rte_max_length_512_epoch_10` is a English model originally trained by yy642. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_rte_max_length_512_epoch_10_en_5.5.0_3.0_1727272894453.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_rte_max_length_512_epoch_10_en_5.5.0_3.0_1727272894453.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_rte_max_length_512_epoch_10","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_rte_max_length_512_epoch_10", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_rte_max_length_512_epoch_10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yy642/bert-base-uncased-finetuned-rte-max-length-512-epoch-10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_rte_max_length_512_epoch_10_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_rte_max_length_512_epoch_10_pipeline_en.md new file mode 100644 index 00000000000000..e2a116b1f50d8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_rte_max_length_512_epoch_10_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_rte_max_length_512_epoch_10_pipeline pipeline BertForSequenceClassification from yy642 +author: John Snow Labs +name: bert_base_uncased_finetuned_rte_max_length_512_epoch_10_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_rte_max_length_512_epoch_10_pipeline` is a English model originally trained by yy642. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_rte_max_length_512_epoch_10_pipeline_en_5.5.0_3.0_1727272916326.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_rte_max_length_512_epoch_10_pipeline_en_5.5.0_3.0_1727272916326.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_rte_max_length_512_epoch_10_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_rte_max_length_512_epoch_10_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_rte_max_length_512_epoch_10_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yy642/bert-base-uncased-finetuned-rte-max-length-512-epoch-10 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_rte_max_length_512_epoch_5_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_rte_max_length_512_epoch_5_pipeline_en.md new file mode 100644 index 00000000000000..317acb9f7ea01d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_rte_max_length_512_epoch_5_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_rte_max_length_512_epoch_5_pipeline pipeline BertForSequenceClassification from yy642 +author: John Snow Labs +name: bert_base_uncased_finetuned_rte_max_length_512_epoch_5_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_rte_max_length_512_epoch_5_pipeline` is a English model originally trained by yy642. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_rte_max_length_512_epoch_5_pipeline_en_5.5.0_3.0_1727286811073.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_rte_max_length_512_epoch_5_pipeline_en_5.5.0_3.0_1727286811073.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_rte_max_length_512_epoch_5_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_rte_max_length_512_epoch_5_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_rte_max_length_512_epoch_5_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yy642/bert-base-uncased-finetuned-rte-max-length-512-epoch-5 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_stationary_epoch_update_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_stationary_epoch_update_en.md new file mode 100644 index 00000000000000..ff552f2769a0a5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_stationary_epoch_update_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_stationary_epoch_update BertForSequenceClassification from MKS3099 +author: John Snow Labs +name: bert_base_uncased_finetuned_stationary_epoch_update +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_stationary_epoch_update` is a English model originally trained by MKS3099. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_stationary_epoch_update_en_5.5.0_3.0_1727269239360.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_stationary_epoch_update_en_5.5.0_3.0_1727269239360.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_stationary_epoch_update","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_finetuned_stationary_epoch_update", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_stationary_epoch_update| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/MKS3099/bert-base-uncased-finetuned-stationary-epoch-update \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_toxic_comment_detection_ws23_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_toxic_comment_detection_ws23_pipeline_en.md new file mode 100644 index 00000000000000..7890190b921ad4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_finetuned_toxic_comment_detection_ws23_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_finetuned_toxic_comment_detection_ws23_pipeline pipeline BertForSequenceClassification from tillschwoerer +author: John Snow Labs +name: bert_base_uncased_finetuned_toxic_comment_detection_ws23_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_finetuned_toxic_comment_detection_ws23_pipeline` is a English model originally trained by tillschwoerer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_toxic_comment_detection_ws23_pipeline_en_5.5.0_3.0_1727261098857.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_finetuned_toxic_comment_detection_ws23_pipeline_en_5.5.0_3.0_1727261098857.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_finetuned_toxic_comment_detection_ws23_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_finetuned_toxic_comment_detection_ws23_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_finetuned_toxic_comment_detection_ws23_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/tillschwoerer/bert-base-uncased-finetuned-toxic-comment-detection-ws23 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_glue_cola_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_glue_cola_pipeline_en.md new file mode 100644 index 00000000000000..ccde384f823bad --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_glue_cola_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_glue_cola_pipeline pipeline BertForSequenceClassification from pmthangk09 +author: John Snow Labs +name: bert_base_uncased_glue_cola_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_glue_cola_pipeline` is a English model originally trained by pmthangk09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_glue_cola_pipeline_en_5.5.0_3.0_1727266397271.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_glue_cola_pipeline_en_5.5.0_3.0_1727266397271.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_glue_cola_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_glue_cola_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_glue_cola_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/pmthangk09/bert-base-uncased-glue-cola + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_goemotions_original_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_goemotions_original_finetuned_en.md new file mode 100644 index 00000000000000..49507c9e1f085e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_goemotions_original_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_goemotions_original_finetuned BertForSequenceClassification from justin871030 +author: John Snow Labs +name: bert_base_uncased_goemotions_original_finetuned +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_goemotions_original_finetuned` is a English model originally trained by justin871030. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_goemotions_original_finetuned_en_5.5.0_3.0_1727256792152.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_goemotions_original_finetuned_en_5.5.0_3.0_1727256792152.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_goemotions_original_finetuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_goemotions_original_finetuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_goemotions_original_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/justin871030/bert-base-uncased-goemotions-original-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_imdb_yujiepan_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_imdb_yujiepan_pipeline_en.md new file mode 100644 index 00000000000000..ea96d06150fbb1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_imdb_yujiepan_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_imdb_yujiepan_pipeline pipeline BertForSequenceClassification from yujiepan +author: John Snow Labs +name: bert_base_uncased_imdb_yujiepan_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_imdb_yujiepan_pipeline` is a English model originally trained by yujiepan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_imdb_yujiepan_pipeline_en_5.5.0_3.0_1727273456187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_imdb_yujiepan_pipeline_en_5.5.0_3.0_1727273456187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_imdb_yujiepan_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_imdb_yujiepan_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_imdb_yujiepan_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yujiepan/bert-base-uncased-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_hndc_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_hndc_en.md new file mode 100644 index 00000000000000..720a5ab9417a8f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_hndc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_issues_128_hndc BertEmbeddings from hndc +author: John Snow Labs +name: bert_base_uncased_issues_128_hndc +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_issues_128_hndc` is a English model originally trained by hndc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_hndc_en_5.5.0_3.0_1727241058797.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_hndc_en_5.5.0_3.0_1727241058797.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_hndc","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_hndc","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_issues_128_hndc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/hndc/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_hndc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_hndc_pipeline_en.md new file mode 100644 index 00000000000000..d2604dbeda17ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_hndc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_issues_128_hndc_pipeline pipeline BertEmbeddings from hndc +author: John Snow Labs +name: bert_base_uncased_issues_128_hndc_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_issues_128_hndc_pipeline` is a English model originally trained by hndc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_hndc_pipeline_en_5.5.0_3.0_1727241079853.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_hndc_pipeline_en_5.5.0_3.0_1727241079853.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_issues_128_hndc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_issues_128_hndc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_issues_128_hndc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/hndc/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_makaniski_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_makaniski_en.md new file mode 100644 index 00000000000000..bde06600b17617 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_makaniski_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_issues_128_makaniski BertEmbeddings from makaniski +author: John Snow Labs +name: bert_base_uncased_issues_128_makaniski +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_issues_128_makaniski` is a English model originally trained by makaniski. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_makaniski_en_5.5.0_3.0_1727256250380.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_makaniski_en_5.5.0_3.0_1727256250380.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_makaniski","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_makaniski","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_issues_128_makaniski| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/makaniski/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_pensuke_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_pensuke_en.md new file mode 100644 index 00000000000000..adc7119511c955 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_pensuke_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_issues_128_pensuke BertEmbeddings from pensuke +author: John Snow Labs +name: bert_base_uncased_issues_128_pensuke +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_issues_128_pensuke` is a English model originally trained by pensuke. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_pensuke_en_5.5.0_3.0_1727258538352.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_pensuke_en_5.5.0_3.0_1727258538352.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_pensuke","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_pensuke","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_issues_128_pensuke| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/pensuke/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_robinsh2023_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_robinsh2023_en.md new file mode 100644 index 00000000000000..a9fdc7f3dcdf3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_robinsh2023_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_issues_128_robinsh2023 BertEmbeddings from Robinsh2023 +author: John Snow Labs +name: bert_base_uncased_issues_128_robinsh2023 +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_issues_128_robinsh2023` is a English model originally trained by Robinsh2023. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_robinsh2023_en_5.5.0_3.0_1727236979136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_robinsh2023_en_5.5.0_3.0_1727236979136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_robinsh2023","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_robinsh2023","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_issues_128_robinsh2023| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/Robinsh2023/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_robinsh2023_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_robinsh2023_pipeline_en.md new file mode 100644 index 00000000000000..bed36dc5a81c19 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_robinsh2023_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_issues_128_robinsh2023_pipeline pipeline BertEmbeddings from Robinsh2023 +author: John Snow Labs +name: bert_base_uncased_issues_128_robinsh2023_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_issues_128_robinsh2023_pipeline` is a English model originally trained by Robinsh2023. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_robinsh2023_pipeline_en_5.5.0_3.0_1727237000542.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_robinsh2023_pipeline_en_5.5.0_3.0_1727237000542.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_issues_128_robinsh2023_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_issues_128_robinsh2023_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_issues_128_robinsh2023_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Robinsh2023/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_seddiktrk_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_seddiktrk_en.md new file mode 100644 index 00000000000000..b92dd63079e977 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_seddiktrk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_issues_128_seddiktrk BertEmbeddings from seddiktrk +author: John Snow Labs +name: bert_base_uncased_issues_128_seddiktrk +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_issues_128_seddiktrk` is a English model originally trained by seddiktrk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_seddiktrk_en_5.5.0_3.0_1727231353065.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_seddiktrk_en_5.5.0_3.0_1727231353065.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_seddiktrk","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_issues_128_seddiktrk","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_issues_128_seddiktrk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/seddiktrk/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_seddiktrk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_seddiktrk_pipeline_en.md new file mode 100644 index 00000000000000..e58b1d21370cd6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_issues_128_seddiktrk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_issues_128_seddiktrk_pipeline pipeline BertEmbeddings from seddiktrk +author: John Snow Labs +name: bert_base_uncased_issues_128_seddiktrk_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_issues_128_seddiktrk_pipeline` is a English model originally trained by seddiktrk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_seddiktrk_pipeline_en_5.5.0_3.0_1727231374198.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_issues_128_seddiktrk_pipeline_en_5.5.0_3.0_1727231374198.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_issues_128_seddiktrk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_issues_128_seddiktrk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_issues_128_seddiktrk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/seddiktrk/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_job_bias_seq_cls_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_job_bias_seq_cls_en.md new file mode 100644 index 00000000000000..3ccf905358ccaf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_job_bias_seq_cls_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_job_bias_seq_cls BertForSequenceClassification from 2024-mcm-everitt-ryan +author: John Snow Labs +name: bert_base_uncased_job_bias_seq_cls +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_job_bias_seq_cls` is a English model originally trained by 2024-mcm-everitt-ryan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_job_bias_seq_cls_en_5.5.0_3.0_1727269409641.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_job_bias_seq_cls_en_5.5.0_3.0_1727269409641.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_job_bias_seq_cls","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_job_bias_seq_cls", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_job_bias_seq_cls| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/2024-mcm-everitt-ryan/bert-base-uncased-job-bias-seq-cls \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kaggle_twitter_small_finetuned_clf_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kaggle_twitter_small_finetuned_clf_en.md new file mode 100644 index 00000000000000..5f0a827e4ff165 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kaggle_twitter_small_finetuned_clf_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_kaggle_twitter_small_finetuned_clf BertForSequenceClassification from zloelias +author: John Snow Labs +name: bert_base_uncased_kaggle_twitter_small_finetuned_clf +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_kaggle_twitter_small_finetuned_clf` is a English model originally trained by zloelias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_kaggle_twitter_small_finetuned_clf_en_5.5.0_3.0_1727272761007.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_kaggle_twitter_small_finetuned_clf_en_5.5.0_3.0_1727272761007.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_kaggle_twitter_small_finetuned_clf","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_kaggle_twitter_small_finetuned_clf", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_kaggle_twitter_small_finetuned_clf| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/zloelias/bert-base-uncased-kaggle_twitter_small-finetuned-clf \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kaggle_twitter_small_finetuned_clf_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kaggle_twitter_small_finetuned_clf_pipeline_en.md new file mode 100644 index 00000000000000..3d95797c3b0442 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kaggle_twitter_small_finetuned_clf_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_kaggle_twitter_small_finetuned_clf_pipeline pipeline BertForSequenceClassification from zloelias +author: John Snow Labs +name: bert_base_uncased_kaggle_twitter_small_finetuned_clf_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_kaggle_twitter_small_finetuned_clf_pipeline` is a English model originally trained by zloelias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_kaggle_twitter_small_finetuned_clf_pipeline_en_5.5.0_3.0_1727272783099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_kaggle_twitter_small_finetuned_clf_pipeline_en_5.5.0_3.0_1727272783099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_kaggle_twitter_small_finetuned_clf_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_kaggle_twitter_small_finetuned_clf_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_kaggle_twitter_small_finetuned_clf_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/zloelias/bert-base-uncased-kaggle_twitter_small-finetuned-clf + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kinyarwanda_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kinyarwanda_finetuned_en.md new file mode 100644 index 00000000000000..b723f286d24036 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kinyarwanda_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_kinyarwanda_finetuned BertEmbeddings from RogerB +author: John Snow Labs +name: bert_base_uncased_kinyarwanda_finetuned +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_kinyarwanda_finetuned` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_kinyarwanda_finetuned_en_5.5.0_3.0_1727242863679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_kinyarwanda_finetuned_en_5.5.0_3.0_1727242863679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_uncased_kinyarwanda_finetuned","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_uncased_kinyarwanda_finetuned","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_kinyarwanda_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/RogerB/bert-base-uncased-kinyarwanda-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kinyarwanda_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kinyarwanda_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..848d790829ea6a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_kinyarwanda_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_kinyarwanda_finetuned_pipeline pipeline BertEmbeddings from RogerB +author: John Snow Labs +name: bert_base_uncased_kinyarwanda_finetuned_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_kinyarwanda_finetuned_pipeline` is a English model originally trained by RogerB. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_kinyarwanda_finetuned_pipeline_en_5.5.0_3.0_1727242885172.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_kinyarwanda_finetuned_pipeline_en_5.5.0_3.0_1727242885172.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_kinyarwanda_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_kinyarwanda_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_kinyarwanda_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/RogerB/bert-base-uncased-kinyarwanda-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_malayalam_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_malayalam_pipeline_en.md new file mode 100644 index 00000000000000..f20b43bd5a6b19 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_malayalam_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_malayalam_pipeline pipeline BertEmbeddings from Tural +author: John Snow Labs +name: bert_base_uncased_malayalam_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_malayalam_pipeline` is a English model originally trained by Tural. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_malayalam_pipeline_en_5.5.0_3.0_1727232998365.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_malayalam_pipeline_en_5.5.0_3.0_1727232998365.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_malayalam_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_malayalam_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_malayalam_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.9 MB| + +## References + +https://huggingface.co/Tural/bert-base-uncased-ml + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_mlp_scirepeval_chemistry_large_textcls_rheology_20230913_3_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_mlp_scirepeval_chemistry_large_textcls_rheology_20230913_3_en.md new file mode 100644 index 00000000000000..e0126974b98a37 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_mlp_scirepeval_chemistry_large_textcls_rheology_20230913_3_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_mlp_scirepeval_chemistry_large_textcls_rheology_20230913_3 BertForSequenceClassification from jonas-luehrs +author: John Snow Labs +name: bert_base_uncased_mlp_scirepeval_chemistry_large_textcls_rheology_20230913_3 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_mlp_scirepeval_chemistry_large_textcls_rheology_20230913_3` is a English model originally trained by jonas-luehrs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_mlp_scirepeval_chemistry_large_textcls_rheology_20230913_3_en_5.5.0_3.0_1727263801059.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_mlp_scirepeval_chemistry_large_textcls_rheology_20230913_3_en_5.5.0_3.0_1727263801059.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_mlp_scirepeval_chemistry_large_textcls_rheology_20230913_3","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_mlp_scirepeval_chemistry_large_textcls_rheology_20230913_3", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_mlp_scirepeval_chemistry_large_textcls_rheology_20230913_3| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/jonas-luehrs/bert-base-uncased-MLP-scirepeval-chemistry-LARGE-textCLS-RHEOLOGY-20230913-3 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_qa_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_qa_classification_pipeline_en.md new file mode 100644 index 00000000000000..fcf0d049084348 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_qa_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_qa_classification_pipeline pipeline BertForSequenceClassification from kgourgou +author: John Snow Labs +name: bert_base_uncased_qa_classification_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_qa_classification_pipeline` is a English model originally trained by kgourgou. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qa_classification_pipeline_en_5.5.0_3.0_1727285908570.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qa_classification_pipeline_en_5.5.0_3.0_1727285908570.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_qa_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_qa_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_qa_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/kgourgou/bert-base-uncased-QA-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_qnli_howey_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_qnli_howey_pipeline_en.md new file mode 100644 index 00000000000000..d7f225a1bb3758 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_qnli_howey_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_qnli_howey_pipeline pipeline BertForSequenceClassification from howey +author: John Snow Labs +name: bert_base_uncased_qnli_howey_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_qnli_howey_pipeline` is a English model originally trained by howey. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qnli_howey_pipeline_en_5.5.0_3.0_1727269957256.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_qnli_howey_pipeline_en_5.5.0_3.0_1727269957256.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_qnli_howey_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_qnli_howey_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_qnli_howey_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/howey/bert-base-uncased-qnli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_review1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_review1_pipeline_en.md new file mode 100644 index 00000000000000..2f74c9fbb8a8c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_review1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_review1_pipeline pipeline BertForSequenceClassification from Iresh88 +author: John Snow Labs +name: bert_base_uncased_review1_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_review1_pipeline` is a English model originally trained by Iresh88. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_review1_pipeline_en_5.5.0_3.0_1727267682494.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_review1_pipeline_en_5.5.0_3.0_1727267682494.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_review1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_review1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_review1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Iresh88/bert-base-uncased-review1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_rte_from_bert_large_uncased_rte_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_rte_from_bert_large_uncased_rte_en.md new file mode 100644 index 00000000000000..19a245f1687c28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_rte_from_bert_large_uncased_rte_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_rte_from_bert_large_uncased_rte BertForSequenceClassification from yoshitomo-matsubara +author: John Snow Labs +name: bert_base_uncased_rte_from_bert_large_uncased_rte +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_rte_from_bert_large_uncased_rte` is a English model originally trained by yoshitomo-matsubara. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_rte_from_bert_large_uncased_rte_en_5.5.0_3.0_1727269973506.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_rte_from_bert_large_uncased_rte_en_5.5.0_3.0_1727269973506.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_rte_from_bert_large_uncased_rte","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_rte_from_bert_large_uncased_rte", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_rte_from_bert_large_uncased_rte| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yoshitomo-matsubara/bert-base-uncased-rte_from_bert-large-uncased-rte \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_rte_from_bert_large_uncased_rte_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_rte_from_bert_large_uncased_rte_pipeline_en.md new file mode 100644 index 00000000000000..5eb55ca6b48ddf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_rte_from_bert_large_uncased_rte_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_rte_from_bert_large_uncased_rte_pipeline pipeline BertForSequenceClassification from yoshitomo-matsubara +author: John Snow Labs +name: bert_base_uncased_rte_from_bert_large_uncased_rte_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_rte_from_bert_large_uncased_rte_pipeline` is a English model originally trained by yoshitomo-matsubara. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_rte_from_bert_large_uncased_rte_pipeline_en_5.5.0_3.0_1727269994734.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_rte_from_bert_large_uncased_rte_pipeline_en_5.5.0_3.0_1727269994734.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_rte_from_bert_large_uncased_rte_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_rte_from_bert_large_uncased_rte_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_rte_from_bert_large_uncased_rte_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yoshitomo-matsubara/bert-base-uncased-rte_from_bert-large-uncased-rte + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_sst_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_sst_en.md new file mode 100644 index 00000000000000..0fd14d26195a79 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_sst_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_sst BertForSequenceClassification from pmthangk09 +author: John Snow Labs +name: bert_base_uncased_sst +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_sst` is a English model originally trained by pmthangk09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_sst_en_5.5.0_3.0_1727278143092.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_sst_en_5.5.0_3.0_1727278143092.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_sst","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_sst", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_sst| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/pmthangk09/bert-base-uncased-sst \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_tajik_ner_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_tajik_ner_en.md new file mode 100644 index 00000000000000..baf373d81dffbc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_tajik_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_tajik_ner BertForTokenClassification from muhtasham +author: John Snow Labs +name: bert_base_uncased_tajik_ner +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_tajik_ner` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_tajik_ner_en_5.5.0_3.0_1727260762157.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_tajik_ner_en_5.5.0_3.0_1727260762157.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncased_tajik_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncased_tajik_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_tajik_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/muhtasham/bert-base-uncased-tajik-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_tajik_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_tajik_ner_pipeline_en.md new file mode 100644 index 00000000000000..aadd53fcdc5617 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_tajik_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_uncased_tajik_ner_pipeline pipeline BertForTokenClassification from muhtasham +author: John Snow Labs +name: bert_base_uncased_tajik_ner_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_tajik_ner_pipeline` is a English model originally trained by muhtasham. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_tajik_ner_pipeline_en_5.5.0_3.0_1727260783318.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_tajik_ner_pipeline_en_5.5.0_3.0_1727260783318.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_uncased_tajik_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_uncased_tajik_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_tajik_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/muhtasham/bert-base-uncased-tajik-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_token_itr0_0_0001_train_all_test_null__second_train_set_null_false_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_token_itr0_0_0001_train_all_test_null__second_train_set_null_false_en.md new file mode 100644 index 00000000000000..7d08dd032c4e75 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_token_itr0_0_0001_train_all_test_null__second_train_set_null_false_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_token_itr0_0_0001_train_all_test_null__second_train_set_null_false BertForTokenClassification from ali2066 +author: John Snow Labs +name: bert_base_uncased_token_itr0_0_0001_train_all_test_null__second_train_set_null_false +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_token_itr0_0_0001_train_all_test_null__second_train_set_null_false` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_token_itr0_0_0001_train_all_test_null__second_train_set_null_false_en_5.5.0_3.0_1727260588150.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_token_itr0_0_0001_train_all_test_null__second_train_set_null_false_en_5.5.0_3.0_1727260588150.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncased_token_itr0_0_0001_train_all_test_null__second_train_set_null_false","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_base_uncased_token_itr0_0_0001_train_all_test_null__second_train_set_null_false", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_token_itr0_0_0001_train_all_test_null__second_train_set_null_false| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ali2066/bert-base-uncased_token_itr0_0.0001_TRAIN_all_TEST_null__second_train_set_NULL_False \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_toxicity_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_toxicity_en.md new file mode 100644 index 00000000000000..7376f426de5fa3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_uncased_toxicity_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_uncased_toxicity BertForSequenceClassification from mohsenfayyaz +author: John Snow Labs +name: bert_base_uncased_toxicity +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_uncased_toxicity` is a English model originally trained by mohsenfayyaz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_uncased_toxicity_en_5.5.0_3.0_1727269890788.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_uncased_toxicity_en_5.5.0_3.0_1727269890788.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_toxicity","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_base_uncased_toxicity", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_uncased_toxicity| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/mohsenfayyaz/bert-base-uncased-toxicity \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_vietnamese_pipeline_vi.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_vietnamese_pipeline_vi.md new file mode 100644 index 00000000000000..5fbb84c256ba81 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_vietnamese_pipeline_vi.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Vietnamese bert_base_vietnamese_pipeline pipeline BertForSequenceClassification from ndbao2002 +author: John Snow Labs +name: bert_base_vietnamese_pipeline +date: 2024-09-25 +tags: [vi, open_source, pipeline, onnx] +task: Text Classification +language: vi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_vietnamese_pipeline` is a Vietnamese model originally trained by ndbao2002. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_vietnamese_pipeline_vi_5.5.0_3.0_1727278741439.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_vietnamese_pipeline_vi_5.5.0_3.0_1727278741439.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_vietnamese_pipeline", lang = "vi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_vietnamese_pipeline", lang = "vi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_vietnamese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|vi| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ndbao2002/bert-base-vi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_vk_posts_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_vk_posts_en.md new file mode 100644 index 00000000000000..a12644fe0dbdb5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_vk_posts_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_base_vk_posts BertEmbeddings from serggor +author: John Snow Labs +name: bert_base_vk_posts +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_vk_posts` is a English model originally trained by serggor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_vk_posts_en_5.5.0_3.0_1727256498749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_vk_posts_en_5.5.0_3.0_1727256498749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_base_vk_posts","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_base_vk_posts","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_vk_posts| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/serggor/bert-base-vk-posts \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_base_vk_posts_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_base_vk_posts_pipeline_en.md new file mode 100644 index 00000000000000..ad3029946052f1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_base_vk_posts_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_base_vk_posts_pipeline pipeline BertEmbeddings from serggor +author: John Snow Labs +name: bert_base_vk_posts_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_base_vk_posts_pipeline` is a English model originally trained by serggor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_base_vk_posts_pipeline_en_5.5.0_3.0_1727256519859.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_base_vk_posts_pipeline_en_5.5.0_3.0_1727256519859.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_base_vk_posts_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_base_vk_posts_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_base_vk_posts_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/serggor/bert-base-vk-posts + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_baseline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_baseline_en.md new file mode 100644 index 00000000000000..b91db117ec9834 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_baseline_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_baseline BertForSequenceClassification from florentgbelidji +author: John Snow Labs +name: bert_baseline +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_baseline` is a English model originally trained by florentgbelidji. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_baseline_en_5.5.0_3.0_1727278504573.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_baseline_en_5.5.0_3.0_1727278504573.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_baseline","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_baseline", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_baseline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/florentgbelidji/BERT_baseline \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_classifier_tuned_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_classifier_tuned_en.md new file mode 100644 index 00000000000000..8d053083ee44ba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_classifier_tuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_classifier_tuned BertForSequenceClassification from omgavy +author: John Snow Labs +name: bert_classifier_tuned +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_classifier_tuned` is a English model originally trained by omgavy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_classifier_tuned_en_5.5.0_3.0_1727267541449.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_classifier_tuned_en_5.5.0_3.0_1727267541449.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_classifier_tuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_classifier_tuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_classifier_tuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/omgavy/bert-classifier-tuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_classifier_turkish_sentiment_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_classifier_turkish_sentiment_en.md new file mode 100644 index 00000000000000..906731dfb03d32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_classifier_turkish_sentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_classifier_turkish_sentiment BertForSequenceClassification from sunor +author: John Snow Labs +name: bert_classifier_turkish_sentiment +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_classifier_turkish_sentiment` is a English model originally trained by sunor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_classifier_turkish_sentiment_en_5.5.0_3.0_1727263454242.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_classifier_turkish_sentiment_en_5.5.0_3.0_1727263454242.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_classifier_turkish_sentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_classifier_turkish_sentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_classifier_turkish_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|414.5 MB| + +## References + +https://huggingface.co/sunor/bert-classifier-turkish-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_classifier_turkish_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_classifier_turkish_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..0cc764fc570a07 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_classifier_turkish_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_classifier_turkish_sentiment_pipeline pipeline BertForSequenceClassification from sunor +author: John Snow Labs +name: bert_classifier_turkish_sentiment_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_classifier_turkish_sentiment_pipeline` is a English model originally trained by sunor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_classifier_turkish_sentiment_pipeline_en_5.5.0_3.0_1727263478776.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_classifier_turkish_sentiment_pipeline_en_5.5.0_3.0_1727263478776.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_classifier_turkish_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_classifier_turkish_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_classifier_turkish_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.5 MB| + +## References + +https://huggingface.co/sunor/bert-classifier-turkish-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_cn_finetuning_wangyuwei_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_cn_finetuning_wangyuwei_pipeline_en.md new file mode 100644 index 00000000000000..b726b838b12ec2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_cn_finetuning_wangyuwei_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_cn_finetuning_wangyuwei_pipeline pipeline BertForSequenceClassification from wangyuwei +author: John Snow Labs +name: bert_cn_finetuning_wangyuwei_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_cn_finetuning_wangyuwei_pipeline` is a English model originally trained by wangyuwei. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_cn_finetuning_wangyuwei_pipeline_en_5.5.0_3.0_1727288868856.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_cn_finetuning_wangyuwei_pipeline_en_5.5.0_3.0_1727288868856.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_cn_finetuning_wangyuwei_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_cn_finetuning_wangyuwei_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_cn_finetuning_wangyuwei_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|383.3 MB| + +## References + +https://huggingface.co/wangyuwei/bert_cn_finetuning + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_election2020_twitter_stance_biden_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_election2020_twitter_stance_biden_en.md new file mode 100644 index 00000000000000..786113fa5ac6c2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_election2020_twitter_stance_biden_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_election2020_twitter_stance_biden BertForSequenceClassification from kornosk +author: John Snow Labs +name: bert_election2020_twitter_stance_biden +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_election2020_twitter_stance_biden` is a English model originally trained by kornosk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_election2020_twitter_stance_biden_en_5.5.0_3.0_1727239460016.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_election2020_twitter_stance_biden_en_5.5.0_3.0_1727239460016.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_election2020_twitter_stance_biden","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_election2020_twitter_stance_biden", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_election2020_twitter_stance_biden| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.8 MB| + +## References + +https://huggingface.co/kornosk/bert-election2020-twitter-stance-biden \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_emotions_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_emotions_en.md new file mode 100644 index 00000000000000..6eed5305f48186 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_emotions_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_emotions BertForSequenceClassification from Yanni8 +author: John Snow Labs +name: bert_emotions +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_emotions` is a English model originally trained by Yanni8. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_emotions_en_5.5.0_3.0_1727261714721.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_emotions_en_5.5.0_3.0_1727261714721.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_emotions","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_emotions", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_emotions| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Yanni8/bert-emotions \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_emotions_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_emotions_pipeline_en.md new file mode 100644 index 00000000000000..7bc0b843148c5a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_emotions_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_emotions_pipeline pipeline BertForSequenceClassification from Yanni8 +author: John Snow Labs +name: bert_emotions_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_emotions_pipeline` is a English model originally trained by Yanni8. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_emotions_pipeline_en_5.5.0_3.0_1727261736321.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_emotions_pipeline_en_5.5.0_3.0_1727261736321.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_emotions_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_emotions_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_emotions_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Yanni8/bert-emotions + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_fined_tuned_cola_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_fined_tuned_cola_en.md new file mode 100644 index 00000000000000..b7e19684f4a695 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_fined_tuned_cola_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_fined_tuned_cola BertForSequenceClassification from Utshav +author: John Snow Labs +name: bert_fined_tuned_cola +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_fined_tuned_cola` is a English model originally trained by Utshav. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_fined_tuned_cola_en_5.5.0_3.0_1727288341793.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_fined_tuned_cola_en_5.5.0_3.0_1727288341793.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_fined_tuned_cola","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_fined_tuned_cola", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_fined_tuned_cola| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Utshav/bert-fined-tuned-cola \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_abbreviation_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_abbreviation_en.md new file mode 100644 index 00000000000000..bb85271802db91 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_abbreviation_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_abbreviation BertForTokenClassification from dammy +author: John Snow Labs +name: bert_finetuned_abbreviation +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_abbreviation` is a English model originally trained by dammy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_abbreviation_en_5.5.0_3.0_1727260244412.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_abbreviation_en_5.5.0_3.0_1727260244412.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_abbreviation","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_abbreviation", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_abbreviation| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/dammy/bert-finetuned-abbreviation \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_abbreviation_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_abbreviation_pipeline_en.md new file mode 100644 index 00000000000000..61b00a4a07bb43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_abbreviation_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_abbreviation_pipeline pipeline BertForTokenClassification from dammy +author: John Snow Labs +name: bert_finetuned_abbreviation_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_abbreviation_pipeline` is a English model originally trained by dammy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_abbreviation_pipeline_en_5.5.0_3.0_1727260265611.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_abbreviation_pipeline_en_5.5.0_3.0_1727260265611.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_abbreviation_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_abbreviation_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_abbreviation_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/dammy/bert-finetuned-abbreviation + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_age_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_age_pipeline_en.md new file mode 100644 index 00000000000000..801f828ff5e566 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_age_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_age_pipeline pipeline BertForSequenceClassification from Abderrahim2 +author: John Snow Labs +name: bert_finetuned_age_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_age_pipeline` is a English model originally trained by Abderrahim2. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_age_pipeline_en_5.5.0_3.0_1727276390139.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_age_pipeline_en_5.5.0_3.0_1727276390139.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_age_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_age_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_age_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.5 MB| + +## References + +https://huggingface.co/Abderrahim2/bert-finetuned-Age + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_hausa_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_hausa_ner_pipeline_en.md new file mode 100644 index 00000000000000..65c522ef37d80b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_hausa_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_hausa_ner_pipeline pipeline BertForTokenClassification from peteryushunli +author: John Snow Labs +name: bert_finetuned_hausa_ner_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_hausa_ner_pipeline` is a English model originally trained by peteryushunli. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_hausa_ner_pipeline_en_5.5.0_3.0_1727260026645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_hausa_ner_pipeline_en_5.5.0_3.0_1727260026645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_hausa_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_hausa_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_hausa_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/peteryushunli/bert-finetuned-hausa_ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_cti_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_cti_en.md new file mode 100644 index 00000000000000..a9a43e3b081c7e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_cti_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_ner_cti BertForTokenClassification from thongnef +author: John Snow Labs +name: bert_finetuned_ner_cti +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_cti` is a English model originally trained by thongnef. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_cti_en_5.5.0_3.0_1727250597375.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_cti_en_5.5.0_3.0_1727250597375.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_cti","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_cti", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_cti| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/thongnef/bert-finetuned-ner-cti \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_cti_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_cti_pipeline_en.md new file mode 100644 index 00000000000000..4c21064956ec19 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_cti_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_ner_cti_pipeline pipeline BertForTokenClassification from thongnef +author: John Snow Labs +name: bert_finetuned_ner_cti_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_cti_pipeline` is a English model originally trained by thongnef. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_cti_pipeline_en_5.5.0_3.0_1727250618087.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_cti_pipeline_en_5.5.0_3.0_1727250618087.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_ner_cti_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_ner_cti_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_cti_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/thongnef/bert-finetuned-ner-cti + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_hydrochii_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_hydrochii_en.md new file mode 100644 index 00000000000000..76582a2ba76dc0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_hydrochii_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_ner_hydrochii BertForTokenClassification from hydrochii +author: John Snow Labs +name: bert_finetuned_ner_hydrochii +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_hydrochii` is a English model originally trained by hydrochii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_hydrochii_en_5.5.0_3.0_1727270734813.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_hydrochii_en_5.5.0_3.0_1727270734813.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_hydrochii","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_hydrochii", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_hydrochii| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/hydrochii/bert-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_mjwlyy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_mjwlyy_pipeline_en.md new file mode 100644 index 00000000000000..dc4bf9a11a4931 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_mjwlyy_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_ner_mjwlyy_pipeline pipeline BertForTokenClassification from MJWLYY +author: John Snow Labs +name: bert_finetuned_ner_mjwlyy_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_mjwlyy_pipeline` is a English model originally trained by MJWLYY. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_mjwlyy_pipeline_en_5.5.0_3.0_1727249785506.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_mjwlyy_pipeline_en_5.5.0_3.0_1727249785506.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_ner_mjwlyy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_ner_mjwlyy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_mjwlyy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/MJWLYY/bert-finetuned-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_proccyon_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_proccyon_en.md new file mode 100644 index 00000000000000..1b871c653e068a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_proccyon_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_ner_proccyon BertForTokenClassification from Proccyon +author: John Snow Labs +name: bert_finetuned_ner_proccyon +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_proccyon` is a English model originally trained by Proccyon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_proccyon_en_5.5.0_3.0_1727262284868.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_proccyon_en_5.5.0_3.0_1727262284868.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_proccyon","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_proccyon", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_proccyon| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Proccyon/bert-finetuned-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_proccyon_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_proccyon_pipeline_en.md new file mode 100644 index 00000000000000..1c1cea251231cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_proccyon_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_ner_proccyon_pipeline pipeline BertForTokenClassification from Proccyon +author: John Snow Labs +name: bert_finetuned_ner_proccyon_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_proccyon_pipeline` is a English model originally trained by Proccyon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_proccyon_pipeline_en_5.5.0_3.0_1727262306020.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_proccyon_pipeline_en_5.5.0_3.0_1727262306020.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_ner_proccyon_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_ner_proccyon_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_proccyon_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/Proccyon/bert-finetuned-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_word_embedding_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_word_embedding_en.md new file mode 100644 index 00000000000000..41d2f9ee78a26f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_word_embedding_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_ner_word_embedding BertForTokenClassification from lsoni +author: John Snow Labs +name: bert_finetuned_ner_word_embedding +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_word_embedding` is a English model originally trained by lsoni. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_word_embedding_en_5.5.0_3.0_1727283090886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_word_embedding_en_5.5.0_3.0_1727283090886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_word_embedding","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_finetuned_ner_word_embedding", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_word_embedding| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/lsoni/bert-finetuned-ner-word-embedding \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_word_embedding_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_word_embedding_pipeline_en.md new file mode 100644 index 00000000000000..340e4830116fe1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_ner_word_embedding_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_ner_word_embedding_pipeline pipeline BertForTokenClassification from lsoni +author: John Snow Labs +name: bert_finetuned_ner_word_embedding_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_ner_word_embedding_pipeline` is a English model originally trained by lsoni. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_word_embedding_pipeline_en_5.5.0_3.0_1727283112582.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_ner_word_embedding_pipeline_en_5.5.0_3.0_1727283112582.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_ner_word_embedding_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_ner_word_embedding_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_ner_word_embedding_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/lsoni/bert-finetuned-ner-word-embedding + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_semitic_languages_eval_english_lachin_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_semitic_languages_eval_english_lachin_en.md new file mode 100644 index 00000000000000..49c4a36d1b090d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_semitic_languages_eval_english_lachin_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_finetuned_semitic_languages_eval_english_lachin BertForSequenceClassification from Lachin +author: John Snow Labs +name: bert_finetuned_semitic_languages_eval_english_lachin +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_semitic_languages_eval_english_lachin` is a English model originally trained by Lachin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_semitic_languages_eval_english_lachin_en_5.5.0_3.0_1727287151317.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_semitic_languages_eval_english_lachin_en_5.5.0_3.0_1727287151317.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_finetuned_semitic_languages_eval_english_lachin","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_finetuned_semitic_languages_eval_english_lachin", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_semitic_languages_eval_english_lachin| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Lachin/bert-finetuned-sem_eval-english \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_semitic_languages_eval_english_lachin_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_semitic_languages_eval_english_lachin_pipeline_en.md new file mode 100644 index 00000000000000..b169efa574e4f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_finetuned_semitic_languages_eval_english_lachin_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_finetuned_semitic_languages_eval_english_lachin_pipeline pipeline BertForSequenceClassification from Lachin +author: John Snow Labs +name: bert_finetuned_semitic_languages_eval_english_lachin_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_finetuned_semitic_languages_eval_english_lachin_pipeline` is a English model originally trained by Lachin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_finetuned_semitic_languages_eval_english_lachin_pipeline_en_5.5.0_3.0_1727287172332.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_finetuned_semitic_languages_eval_english_lachin_pipeline_en_5.5.0_3.0_1727287172332.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_finetuned_semitic_languages_eval_english_lachin_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_finetuned_semitic_languages_eval_english_lachin_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_finetuned_semitic_languages_eval_english_lachin_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Lachin/bert-finetuned-sem_eval-english + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_large_ambiguidade_sintatica_v1_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_large_ambiguidade_sintatica_v1_en.md new file mode 100644 index 00000000000000..5944cfc31f1935 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_large_ambiguidade_sintatica_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_large_ambiguidade_sintatica_v1 BertForSequenceClassification from osouza +author: John Snow Labs +name: bert_large_ambiguidade_sintatica_v1 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_ambiguidade_sintatica_v1` is a English model originally trained by osouza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_ambiguidade_sintatica_v1_en_5.5.0_3.0_1727265779382.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_ambiguidade_sintatica_v1_en_5.5.0_3.0_1727265779382.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_large_ambiguidade_sintatica_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_large_ambiguidade_sintatica_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_ambiguidade_sintatica_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/osouza/bert-large-ambiguidade-sintatica-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_large_ambiguidade_sintatica_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_large_ambiguidade_sintatica_v1_pipeline_en.md new file mode 100644 index 00000000000000..293c6c03a9c1b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_large_ambiguidade_sintatica_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_large_ambiguidade_sintatica_v1_pipeline pipeline BertForSequenceClassification from osouza +author: John Snow Labs +name: bert_large_ambiguidade_sintatica_v1_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_ambiguidade_sintatica_v1_pipeline` is a English model originally trained by osouza. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_ambiguidade_sintatica_v1_pipeline_en_5.5.0_3.0_1727265802099.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_ambiguidade_sintatica_v1_pipeline_en_5.5.0_3.0_1727265802099.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_ambiguidade_sintatica_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_ambiguidade_sintatica_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_ambiguidade_sintatica_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/osouza/bert-large-ambiguidade-sintatica-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_large_cased_finetuned_ner_augment_01_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_large_cased_finetuned_ner_augment_01_en.md new file mode 100644 index 00000000000000..30e00733a666a9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_large_cased_finetuned_ner_augment_01_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_large_cased_finetuned_ner_augment_01 BertForTokenClassification from lamthanhtin2811 +author: John Snow Labs +name: bert_large_cased_finetuned_ner_augment_01 +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_cased_finetuned_ner_augment_01` is a English model originally trained by lamthanhtin2811. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_cased_finetuned_ner_augment_01_en_5.5.0_3.0_1727282321654.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_cased_finetuned_ner_augment_01_en_5.5.0_3.0_1727282321654.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_large_cased_finetuned_ner_augment_01","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_large_cased_finetuned_ner_augment_01", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_cased_finetuned_ner_augment_01| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/lamthanhtin2811/bert-large-cased-finetuned-ner-augment-01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_large_cased_finetuned_ner_augment_01_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_large_cased_finetuned_ner_augment_01_pipeline_en.md new file mode 100644 index 00000000000000..d600041f859f74 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_large_cased_finetuned_ner_augment_01_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_large_cased_finetuned_ner_augment_01_pipeline pipeline BertForTokenClassification from lamthanhtin2811 +author: John Snow Labs +name: bert_large_cased_finetuned_ner_augment_01_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_cased_finetuned_ner_augment_01_pipeline` is a English model originally trained by lamthanhtin2811. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_cased_finetuned_ner_augment_01_pipeline_en_5.5.0_3.0_1727282385046.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_cased_finetuned_ner_augment_01_pipeline_en_5.5.0_3.0_1727282385046.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_cased_finetuned_ner_augment_01_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_cased_finetuned_ner_augment_01_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_cased_finetuned_ner_augment_01_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/lamthanhtin2811/bert-large-cased-finetuned-ner-augment-01 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_large_ner_pii_062024_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_large_ner_pii_062024_en.md new file mode 100644 index 00000000000000..65e343651905e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_large_ner_pii_062024_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_large_ner_pii_062024 BertForTokenClassification from vuminhtue +author: John Snow Labs +name: bert_large_ner_pii_062024 +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_ner_pii_062024` is a English model originally trained by vuminhtue. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_ner_pii_062024_en_5.5.0_3.0_1727275036455.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_ner_pii_062024_en_5.5.0_3.0_1727275036455.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_large_ner_pii_062024","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_large_ner_pii_062024", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_ner_pii_062024| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/vuminhtue/Bert_large_NER_PII_062024 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_large_portuguese_archive_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_large_portuguese_archive_pipeline_en.md new file mode 100644 index 00000000000000..61c01491d65fda --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_large_portuguese_archive_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_large_portuguese_archive_pipeline pipeline BertForTokenClassification from lfcc +author: John Snow Labs +name: bert_large_portuguese_archive_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_portuguese_archive_pipeline` is a English model originally trained by lfcc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_portuguese_archive_pipeline_en_5.5.0_3.0_1727270979698.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_portuguese_archive_pipeline_en_5.5.0_3.0_1727270979698.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_portuguese_archive_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_portuguese_archive_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_portuguese_archive_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/lfcc/bert-large-pt-archive + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_en.md new file mode 100644 index 00000000000000..da71f4e73bf065 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: BERT Embeddings (Large Uncased) +author: John Snow Labs +name: bert_large_uncased +date: 2024-09-25 +tags: [open_source, embeddings, en, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This model contains a deep bidirectional transformer trained on Wikipedia and the BookCorpus. The details are described in the paper "[BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)". + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_en_5.5.0_3.0_1727242974255.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_en_5.5.0_3.0_1727242974255.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +... +embeddings = BertEmbeddings.pretrained("bert_large_uncased", "en") \ +.setInputCols("sentence", "token") \ +.setOutputCol("embeddings") +nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings]) +pipeline_model = nlp_pipeline.fit(spark.createDataFrame([[""]]).toDF("text")) +result = pipeline_model.transform(spark.createDataFrame([['I love NLP']], ["text"])) +``` +```scala +... +val embeddings = BertEmbeddings.pretrained("bert_large_uncased", "en") +.setInputCols("sentence", "token") +.setOutputCol("embeddings") +val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings)) +val data = Seq("I love NLP").toDF("text") +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu + +text = ["I love NLP"] +embeddings_df = nlu.load('en.embed.bert.large_uncased').predict(text, output_level='token') +embeddings_df +``` +
+ +## Results + +```bash + + en_embed_bert_large_uncased_embeddings token + + [-0.07447264343500137, -0.337308406829834, -0.... I + [-0.5735481977462769, -0.3580206632614136, -0.... love + [-0.3929762840270996, -0.4147087037563324, 0.2... NLP +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|1.3 GB| \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_english_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_english_ner_pipeline_en.md new file mode 100644 index 00000000000000..036d0a62f168f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_english_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_large_uncased_english_ner_pipeline pipeline BertForTokenClassification from n6ai +author: John Snow Labs +name: bert_large_uncased_english_ner_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_english_ner_pipeline` is a English model originally trained by n6ai. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_english_ner_pipeline_en_5.5.0_3.0_1727281779260.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_english_ner_pipeline_en_5.5.0_3.0_1727281779260.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_uncased_english_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_uncased_english_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_english_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/n6ai/bert-large-uncased-en-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_finetuned_edos_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_finetuned_edos_pipeline_en.md new file mode 100644 index 00000000000000..8229d6e80fee0b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_finetuned_edos_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_large_uncased_finetuned_edos_pipeline pipeline BertForSequenceClassification from reinforz +author: John Snow Labs +name: bert_large_uncased_finetuned_edos_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_finetuned_edos_pipeline` is a English model originally trained by reinforz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_finetuned_edos_pipeline_en_5.5.0_3.0_1727269394576.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_finetuned_edos_pipeline_en_5.5.0_3.0_1727269394576.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_uncased_finetuned_edos_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_uncased_finetuned_edos_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_finetuned_edos_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/reinforz/bert-large-uncased-finetuned-edos + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_pipeline_en.md new file mode 100644 index 00000000000000..2f83dab680ec15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_large_uncased_pipeline pipeline BertEmbeddings from google-bert +author: John Snow Labs +name: bert_large_uncased_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_pipeline` is a English model originally trained by google-bert. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_pipeline_en_5.5.0_3.0_1727243037738.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_pipeline_en_5.5.0_3.0_1727243037738.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_large_uncased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_large_uncased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/google-bert/bert-large-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_wnli_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_wnli_en.md new file mode 100644 index 00000000000000..e05894ebd106d8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_large_uncased_wnli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_large_uncased_wnli BertForSequenceClassification from yoshitomo-matsubara +author: John Snow Labs +name: bert_large_uncased_wnli +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_large_uncased_wnli` is a English model originally trained by yoshitomo-matsubara. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_large_uncased_wnli_en_5.5.0_3.0_1727285124102.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_large_uncased_wnli_en_5.5.0_3.0_1727285124102.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_large_uncased_wnli","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_large_uncased_wnli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_large_uncased_wnli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/yoshitomo-matsubara/bert-large-uncased-wnli \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_mini_domain_adapted_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_mini_domain_adapted_imdb_en.md new file mode 100644 index 00000000000000..4b4bbbd97df26d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_mini_domain_adapted_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_mini_domain_adapted_imdb BertEmbeddings from rasyosef +author: John Snow Labs +name: bert_mini_domain_adapted_imdb +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_mini_domain_adapted_imdb` is a English model originally trained by rasyosef. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_mini_domain_adapted_imdb_en_5.5.0_3.0_1727240872831.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_mini_domain_adapted_imdb_en_5.5.0_3.0_1727240872831.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_mini_domain_adapted_imdb","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_mini_domain_adapted_imdb","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_mini_domain_adapted_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|41.9 MB| + +## References + +https://huggingface.co/rasyosef/bert-mini-domain-adapted-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_mini_domain_adapted_imdb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_mini_domain_adapted_imdb_pipeline_en.md new file mode 100644 index 00000000000000..7df871cf6cfc8d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_mini_domain_adapted_imdb_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_mini_domain_adapted_imdb_pipeline pipeline BertEmbeddings from rasyosef +author: John Snow Labs +name: bert_mini_domain_adapted_imdb_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_mini_domain_adapted_imdb_pipeline` is a English model originally trained by rasyosef. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_mini_domain_adapted_imdb_pipeline_en_5.5.0_3.0_1727240875150.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_mini_domain_adapted_imdb_pipeline_en_5.5.0_3.0_1727240875150.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_mini_domain_adapted_imdb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_mini_domain_adapted_imdb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_mini_domain_adapted_imdb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|41.9 MB| + +## References + +https://huggingface.co/rasyosef/bert-mini-domain-adapted-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_mini_sst2_distilled_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_mini_sst2_distilled_en.md new file mode 100644 index 00000000000000..1ba10dfd9a782c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_mini_sst2_distilled_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_mini_sst2_distilled BertForSequenceClassification from philschmid +author: John Snow Labs +name: bert_mini_sst2_distilled +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_mini_sst2_distilled` is a English model originally trained by philschmid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_mini_sst2_distilled_en_5.5.0_3.0_1727269691593.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_mini_sst2_distilled_en_5.5.0_3.0_1727269691593.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_mini_sst2_distilled","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_mini_sst2_distilled", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_mini_sst2_distilled| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|42.1 MB| + +## References + +https://huggingface.co/philschmid/bert-mini-sst2-distilled \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_mini_sst2_distilled_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_mini_sst2_distilled_pipeline_en.md new file mode 100644 index 00000000000000..e050214574312e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_mini_sst2_distilled_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_mini_sst2_distilled_pipeline pipeline BertForSequenceClassification from philschmid +author: John Snow Labs +name: bert_mini_sst2_distilled_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_mini_sst2_distilled_pipeline` is a English model originally trained by philschmid. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_mini_sst2_distilled_pipeline_en_5.5.0_3.0_1727269694087.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_mini_sst2_distilled_pipeline_en_5.5.0_3.0_1727269694087.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_mini_sst2_distilled_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_mini_sst2_distilled_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_mini_sst2_distilled_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|42.1 MB| + +## References + +https://huggingface.co/philschmid/bert-mini-sst2-distilled + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_mrpc_distilled_cka_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_mrpc_distilled_cka_en.md new file mode 100644 index 00000000000000..afa330d4837a32 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_mrpc_distilled_cka_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_mrpc_distilled_cka BertForSequenceClassification from Sayan01 +author: John Snow Labs +name: bert_mrpc_distilled_cka +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_mrpc_distilled_cka` is a English model originally trained by Sayan01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_mrpc_distilled_cka_en_5.5.0_3.0_1727268774840.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_mrpc_distilled_cka_en_5.5.0_3.0_1727268774840.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_mrpc_distilled_cka","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_mrpc_distilled_cka", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_mrpc_distilled_cka| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|154.8 MB| + +## References + +https://huggingface.co/Sayan01/bert-mrpc-distilled-cka \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_multi_pad_ner_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_multi_pad_ner_en.md new file mode 100644 index 00000000000000..102e031a8dbde1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_multi_pad_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_multi_pad_ner BertForTokenClassification from ArseniyBolotin +author: John Snow Labs +name: bert_multi_pad_ner +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_multi_pad_ner` is a English model originally trained by ArseniyBolotin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_multi_pad_ner_en_5.5.0_3.0_1727263187379.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_multi_pad_ner_en_5.5.0_3.0_1727263187379.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_multi_pad_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_multi_pad_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_multi_pad_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/ArseniyBolotin/bert-multi-PAD-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_multi_pad_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_multi_pad_ner_pipeline_en.md new file mode 100644 index 00000000000000..15e7ba3dbe984a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_multi_pad_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_multi_pad_ner_pipeline pipeline BertForTokenClassification from ArseniyBolotin +author: John Snow Labs +name: bert_multi_pad_ner_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_multi_pad_ner_pipeline` is a English model originally trained by ArseniyBolotin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_multi_pad_ner_pipeline_en_5.5.0_3.0_1727263220672.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_multi_pad_ner_pipeline_en_5.5.0_3.0_1727263220672.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_multi_pad_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_multi_pad_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_multi_pad_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/ArseniyBolotin/bert-multi-PAD-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_nlp_project_ft_imdb_ds_news_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_nlp_project_ft_imdb_ds_news_en.md new file mode 100644 index 00000000000000..6f03583c58363e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_nlp_project_ft_imdb_ds_news_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_nlp_project_ft_imdb_ds_news BertForSequenceClassification from MatFil99 +author: John Snow Labs +name: bert_nlp_project_ft_imdb_ds_news +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_nlp_project_ft_imdb_ds_news` is a English model originally trained by MatFil99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_nlp_project_ft_imdb_ds_news_en_5.5.0_3.0_1727278738362.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_nlp_project_ft_imdb_ds_news_en_5.5.0_3.0_1727278738362.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_nlp_project_ft_imdb_ds_news","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_nlp_project_ft_imdb_ds_news", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_nlp_project_ft_imdb_ds_news| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/MatFil99/bert-nlp-project-ft-imdb-ds-news \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_nlp_project_ft_imdb_ds_news_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_nlp_project_ft_imdb_ds_news_pipeline_en.md new file mode 100644 index 00000000000000..3edd4a3b0b5f24 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_nlp_project_ft_imdb_ds_news_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_nlp_project_ft_imdb_ds_news_pipeline pipeline BertForSequenceClassification from MatFil99 +author: John Snow Labs +name: bert_nlp_project_ft_imdb_ds_news_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_nlp_project_ft_imdb_ds_news_pipeline` is a English model originally trained by MatFil99. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_nlp_project_ft_imdb_ds_news_pipeline_en_5.5.0_3.0_1727278760646.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_nlp_project_ft_imdb_ds_news_pipeline_en_5.5.0_3.0_1727278760646.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_nlp_project_ft_imdb_ds_news_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_nlp_project_ft_imdb_ds_news_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_nlp_project_ft_imdb_ds_news_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/MatFil99/bert-nlp-project-ft-imdb-ds-news + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_persian_farsi_base_uncased_finetuned_parsbert_fa.md b/docs/_posts/ahmedlone127/2024-09-25-bert_persian_farsi_base_uncased_finetuned_parsbert_fa.md new file mode 100644 index 00000000000000..15fb132e1e4864 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_persian_farsi_base_uncased_finetuned_parsbert_fa.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Persian bert_persian_farsi_base_uncased_finetuned_parsbert BertEmbeddings from Yasamansaffari73 +author: John Snow Labs +name: bert_persian_farsi_base_uncased_finetuned_parsbert +date: 2024-09-25 +tags: [fa, open_source, onnx, embeddings, bert] +task: Embeddings +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_persian_farsi_base_uncased_finetuned_parsbert` is a Persian model originally trained by Yasamansaffari73. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_persian_farsi_base_uncased_finetuned_parsbert_fa_5.5.0_3.0_1727241132849.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_persian_farsi_base_uncased_finetuned_parsbert_fa_5.5.0_3.0_1727241132849.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_persian_farsi_base_uncased_finetuned_parsbert","fa") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_persian_farsi_base_uncased_finetuned_parsbert","fa") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_persian_farsi_base_uncased_finetuned_parsbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|fa| +|Size:|606.5 MB| + +## References + +https://huggingface.co/Yasamansaffari73/bert-fa-base-uncased-finetuned-ParsBert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline_fa.md b/docs/_posts/ahmedlone127/2024-09-25-bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline_fa.md new file mode 100644 index 00000000000000..c0dfbd8abb714e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline_fa.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Persian bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline pipeline BertEmbeddings from Yasamansaffari73 +author: John Snow Labs +name: bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline +date: 2024-09-25 +tags: [fa, open_source, pipeline, onnx] +task: Embeddings +language: fa +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline` is a Persian model originally trained by Yasamansaffari73. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline_fa_5.5.0_3.0_1727241163964.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline_fa_5.5.0_3.0_1727241163964.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline", lang = "fa") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline", lang = "fa") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_persian_farsi_base_uncased_finetuned_parsbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|fa| +|Size:|606.5 MB| + +## References + +https://huggingface.co/Yasamansaffari73/bert-fa-base-uncased-finetuned-ParsBert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_phrasebank_sentiment_analysis_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_phrasebank_sentiment_analysis_en.md new file mode 100644 index 00000000000000..7a94276a86dde7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_phrasebank_sentiment_analysis_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_phrasebank_sentiment_analysis BertForSequenceClassification from pkbiswas +author: John Snow Labs +name: bert_phrasebank_sentiment_analysis +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_phrasebank_sentiment_analysis` is a English model originally trained by pkbiswas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_phrasebank_sentiment_analysis_en_5.5.0_3.0_1727264209163.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_phrasebank_sentiment_analysis_en_5.5.0_3.0_1727264209163.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_phrasebank_sentiment_analysis","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_phrasebank_sentiment_analysis", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_phrasebank_sentiment_analysis| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/pkbiswas/Bert-Phrasebank-Sentiment-Analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_phrasebank_sentiment_analysis_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_phrasebank_sentiment_analysis_pipeline_en.md new file mode 100644 index 00000000000000..9ffc7ab67a41b6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_phrasebank_sentiment_analysis_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_phrasebank_sentiment_analysis_pipeline pipeline BertForSequenceClassification from pkbiswas +author: John Snow Labs +name: bert_phrasebank_sentiment_analysis_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_phrasebank_sentiment_analysis_pipeline` is a English model originally trained by pkbiswas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_phrasebank_sentiment_analysis_pipeline_en_5.5.0_3.0_1727264230213.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_phrasebank_sentiment_analysis_pipeline_en_5.5.0_3.0_1727264230213.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_phrasebank_sentiment_analysis_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_phrasebank_sentiment_analysis_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_phrasebank_sentiment_analysis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/pkbiswas/Bert-Phrasebank-Sentiment-Analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_pooling_based_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_pooling_based_en.md new file mode 100644 index 00000000000000..06a8fb941346d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_pooling_based_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_pooling_based BertForSequenceClassification from elifcen +author: John Snow Labs +name: bert_pooling_based +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_pooling_based` is a English model originally trained by elifcen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_pooling_based_en_5.5.0_3.0_1727284833679.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_pooling_based_en_5.5.0_3.0_1727284833679.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_pooling_based","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_pooling_based", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_pooling_based| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/elifcen/bert-pooling-based \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_pretrained_wikitext_2_raw_v1_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_pretrained_wikitext_2_raw_v1_en.md new file mode 100644 index 00000000000000..7b42358bdd40e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_pretrained_wikitext_2_raw_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_pretrained_wikitext_2_raw_v1 BertEmbeddings from dimpo +author: John Snow Labs +name: bert_pretrained_wikitext_2_raw_v1 +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_pretrained_wikitext_2_raw_v1` is a English model originally trained by dimpo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_pretrained_wikitext_2_raw_v1_en_5.5.0_3.0_1727256191997.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_pretrained_wikitext_2_raw_v1_en_5.5.0_3.0_1727256191997.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("bert_pretrained_wikitext_2_raw_v1","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("bert_pretrained_wikitext_2_raw_v1","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_pretrained_wikitext_2_raw_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|407.9 MB| + +## References + +https://huggingface.co/dimpo/bert-pretrained-wikitext-2-raw-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_semaphore_prediction_w2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_semaphore_prediction_w2_pipeline_en.md new file mode 100644 index 00000000000000..787bb017cda68a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_semaphore_prediction_w2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_semaphore_prediction_w2_pipeline pipeline BertForSequenceClassification from bondi +author: John Snow Labs +name: bert_semaphore_prediction_w2_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_semaphore_prediction_w2_pipeline` is a English model originally trained by bondi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_semaphore_prediction_w2_pipeline_en_5.5.0_3.0_1727284708832.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_semaphore_prediction_w2_pipeline_en_5.5.0_3.0_1727284708832.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_semaphore_prediction_w2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_semaphore_prediction_w2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_semaphore_prediction_w2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.7 MB| + +## References + +https://huggingface.co/bondi/bert-semaphore-prediction-w2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_sentiment_classification_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_sentiment_classification_en.md new file mode 100644 index 00000000000000..ffdcd331cbe930 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_sentiment_classification_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_sentiment_classification BertForSequenceClassification from Naren579 +author: John Snow Labs +name: bert_sentiment_classification +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sentiment_classification` is a English model originally trained by Naren579. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sentiment_classification_en_5.5.0_3.0_1727267226186.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sentiment_classification_en_5.5.0_3.0_1727267226186.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_sentiment_classification","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_sentiment_classification", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sentiment_classification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Naren579/BERT-Sentiment-classification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_sentiment_classification_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_sentiment_classification_pipeline_en.md new file mode 100644 index 00000000000000..7c184bdf92c84d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_sentiment_classification_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_sentiment_classification_pipeline pipeline BertForSequenceClassification from Naren579 +author: John Snow Labs +name: bert_sentiment_classification_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sentiment_classification_pipeline` is a English model originally trained by Naren579. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sentiment_classification_pipeline_en_5.5.0_3.0_1727267248567.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sentiment_classification_pipeline_en_5.5.0_3.0_1727267248567.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_sentiment_classification_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_sentiment_classification_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sentiment_classification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Naren579/BERT-Sentiment-classification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_sst5_padding50model_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_sst5_padding50model_en.md new file mode 100644 index 00000000000000..d38861560f89ef --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_sst5_padding50model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_sst5_padding50model BertForSequenceClassification from Realgon +author: John Snow Labs +name: bert_sst5_padding50model +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_sst5_padding50model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_sst5_padding50model_en_5.5.0_3.0_1727287947774.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_sst5_padding50model_en_5.5.0_3.0_1727287947774.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_sst5_padding50model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_sst5_padding50model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_sst5_padding50model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Realgon/bert_sst5_padding50model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_swe_skills_ner_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_swe_skills_ner_en.md new file mode 100644 index 00000000000000..a19fd483f2dc59 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_swe_skills_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_swe_skills_ner BertForTokenClassification from RJuro +author: John Snow Labs +name: bert_swe_skills_ner +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_swe_skills_ner` is a English model originally trained by RJuro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_swe_skills_ner_en_5.5.0_3.0_1727275164826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_swe_skills_ner_en_5.5.0_3.0_1727275164826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_swe_skills_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_swe_skills_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_swe_skills_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|465.2 MB| + +## References + +https://huggingface.co/RJuro/bert-swe-skills-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_swe_skills_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_swe_skills_ner_pipeline_en.md new file mode 100644 index 00000000000000..c1ceaf25d16cfa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_swe_skills_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_swe_skills_ner_pipeline pipeline BertForTokenClassification from RJuro +author: John Snow Labs +name: bert_swe_skills_ner_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_swe_skills_ner_pipeline` is a English model originally trained by RJuro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_swe_skills_ner_pipeline_en_5.5.0_3.0_1727275189804.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_swe_skills_ner_pipeline_en_5.5.0_3.0_1727275189804.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_swe_skills_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_swe_skills_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_swe_skills_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|465.3 MB| + +## References + +https://huggingface.co/RJuro/bert-swe-skills-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_emotion_kd_bert_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_emotion_kd_bert_en.md new file mode 100644 index 00000000000000..bce6b3e6b358f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_emotion_kd_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_tiny_emotion_kd_bert BertForSequenceClassification from gokuls +author: John Snow Labs +name: bert_tiny_emotion_kd_bert +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tiny_emotion_kd_bert` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tiny_emotion_kd_bert_en_5.5.0_3.0_1727279234958.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tiny_emotion_kd_bert_en_5.5.0_3.0_1727279234958.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_tiny_emotion_kd_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_tiny_emotion_kd_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tiny_emotion_kd_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|16.7 MB| + +## References + +https://huggingface.co/gokuls/bert-tiny-emotion-KD-BERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_emotion_kd_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_emotion_kd_bert_pipeline_en.md new file mode 100644 index 00000000000000..b078379ff5c44a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_emotion_kd_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_tiny_emotion_kd_bert_pipeline pipeline BertForSequenceClassification from gokuls +author: John Snow Labs +name: bert_tiny_emotion_kd_bert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tiny_emotion_kd_bert_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tiny_emotion_kd_bert_pipeline_en_5.5.0_3.0_1727279236292.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tiny_emotion_kd_bert_pipeline_en_5.5.0_3.0_1727279236292.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_tiny_emotion_kd_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_tiny_emotion_kd_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tiny_emotion_kd_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|16.8 MB| + +## References + +https://huggingface.co/gokuls/bert-tiny-emotion-KD-BERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_massive_intent_kd_bert_and_distilbert_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_massive_intent_kd_bert_and_distilbert_en.md new file mode 100644 index 00000000000000..8db7ad884c1106 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_massive_intent_kd_bert_and_distilbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_tiny_massive_intent_kd_bert_and_distilbert BertForSequenceClassification from gokuls +author: John Snow Labs +name: bert_tiny_massive_intent_kd_bert_and_distilbert +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tiny_massive_intent_kd_bert_and_distilbert` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tiny_massive_intent_kd_bert_and_distilbert_en_5.5.0_3.0_1727278610224.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tiny_massive_intent_kd_bert_and_distilbert_en_5.5.0_3.0_1727278610224.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("bert_tiny_massive_intent_kd_bert_and_distilbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("bert_tiny_massive_intent_kd_bert_and_distilbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tiny_massive_intent_kd_bert_and_distilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|16.8 MB| + +## References + +https://huggingface.co/gokuls/bert-tiny-Massive-intent-KD-BERT_and_distilBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_massive_intent_kd_bert_and_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_massive_intent_kd_bert_and_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..26bb05c5019956 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_tiny_massive_intent_kd_bert_and_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_tiny_massive_intent_kd_bert_and_distilbert_pipeline pipeline BertForSequenceClassification from gokuls +author: John Snow Labs +name: bert_tiny_massive_intent_kd_bert_and_distilbert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tiny_massive_intent_kd_bert_and_distilbert_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tiny_massive_intent_kd_bert_and_distilbert_pipeline_en_5.5.0_3.0_1727278611412.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tiny_massive_intent_kd_bert_and_distilbert_pipeline_en_5.5.0_3.0_1727278611412.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_tiny_massive_intent_kd_bert_and_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_tiny_massive_intent_kd_bert_and_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tiny_massive_intent_kd_bert_and_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|16.8 MB| + +## References + +https://huggingface.co/gokuls/bert-tiny-Massive-intent-KD-BERT_and_distilBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_tokenizer_updated_multilingual_words_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_tokenizer_updated_multilingual_words_pipeline_xx.md new file mode 100644 index 00000000000000..f9f2929acc01e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_tokenizer_updated_multilingual_words_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual bert_tokenizer_updated_multilingual_words_pipeline pipeline BertForTokenClassification from junaidali +author: John Snow Labs +name: bert_tokenizer_updated_multilingual_words_pipeline +date: 2024-09-25 +tags: [xx, open_source, pipeline, onnx] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tokenizer_updated_multilingual_words_pipeline` is a Multilingual model originally trained by junaidali. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tokenizer_updated_multilingual_words_pipeline_xx_5.5.0_3.0_1727246876908.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tokenizer_updated_multilingual_words_pipeline_xx_5.5.0_3.0_1727246876908.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_tokenizer_updated_multilingual_words_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_tokenizer_updated_multilingual_words_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tokenizer_updated_multilingual_words_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|1.0 GB| + +## References + +https://huggingface.co/junaidali/bert_tokenizer_updated_multilingual_words + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_tokenizer_updated_multilingual_words_xx.md b/docs/_posts/ahmedlone127/2024-09-25-bert_tokenizer_updated_multilingual_words_xx.md new file mode 100644 index 00000000000000..89a0ed20d163e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_tokenizer_updated_multilingual_words_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual bert_tokenizer_updated_multilingual_words BertForTokenClassification from junaidali +author: John Snow Labs +name: bert_tokenizer_updated_multilingual_words +date: 2024-09-25 +tags: [xx, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tokenizer_updated_multilingual_words` is a Multilingual model originally trained by junaidali. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tokenizer_updated_multilingual_words_xx_5.5.0_3.0_1727246823247.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tokenizer_updated_multilingual_words_xx_5.5.0_3.0_1727246823247.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_tokenizer_updated_multilingual_words","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_tokenizer_updated_multilingual_words", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tokenizer_updated_multilingual_words| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|xx| +|Size:|1.0 GB| + +## References + +https://huggingface.co/junaidali/bert_tokenizer_updated_multilingual_words \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_tonga_tonga_islands_distilbert_ner_zacarage_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_tonga_tonga_islands_distilbert_ner_zacarage_en.md new file mode 100644 index 00000000000000..6077cfe1c0bdbb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_tonga_tonga_islands_distilbert_ner_zacarage_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bert_tonga_tonga_islands_distilbert_ner_zacarage BertForTokenClassification from Zacarage +author: John Snow Labs +name: bert_tonga_tonga_islands_distilbert_ner_zacarage +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tonga_tonga_islands_distilbert_ner_zacarage` is a English model originally trained by Zacarage. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tonga_tonga_islands_distilbert_ner_zacarage_en_5.5.0_3.0_1727246436869.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tonga_tonga_islands_distilbert_ner_zacarage_en_5.5.0_3.0_1727246436869.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bert_tonga_tonga_islands_distilbert_ner_zacarage","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bert_tonga_tonga_islands_distilbert_ner_zacarage", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tonga_tonga_islands_distilbert_ner_zacarage| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|244.3 MB| + +## References + +https://huggingface.co/Zacarage/bert-to-distilbert-NER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_tonga_tonga_islands_distilbert_ner_zacarage_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_tonga_tonga_islands_distilbert_ner_zacarage_pipeline_en.md new file mode 100644 index 00000000000000..74bc331faf0dc4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_tonga_tonga_islands_distilbert_ner_zacarage_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_tonga_tonga_islands_distilbert_ner_zacarage_pipeline pipeline BertForTokenClassification from Zacarage +author: John Snow Labs +name: bert_tonga_tonga_islands_distilbert_ner_zacarage_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_tonga_tonga_islands_distilbert_ner_zacarage_pipeline` is a English model originally trained by Zacarage. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_tonga_tonga_islands_distilbert_ner_zacarage_pipeline_en_5.5.0_3.0_1727246449659.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_tonga_tonga_islands_distilbert_ner_zacarage_pipeline_en_5.5.0_3.0_1727246449659.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_tonga_tonga_islands_distilbert_ner_zacarage_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_tonga_tonga_islands_distilbert_ner_zacarage_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_tonga_tonga_islands_distilbert_ner_zacarage_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|244.3 MB| + +## References + +https://huggingface.co/Zacarage/bert-to-distilbert-NER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bert_twitter_english_lost_job_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bert_twitter_english_lost_job_pipeline_en.md new file mode 100644 index 00000000000000..af6f8c26b98853 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bert_twitter_english_lost_job_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bert_twitter_english_lost_job_pipeline pipeline BertForSequenceClassification from worldbank +author: John Snow Labs +name: bert_twitter_english_lost_job_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bert_twitter_english_lost_job_pipeline` is a English model originally trained by worldbank. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bert_twitter_english_lost_job_pipeline_en_5.5.0_3.0_1727277530447.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bert_twitter_english_lost_job_pipeline_en_5.5.0_3.0_1727277530447.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bert_twitter_english_lost_job_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bert_twitter_english_lost_job_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bert_twitter_english_lost_job_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.1 MB| + +## References + +https://huggingface.co/worldbank/bert-twitter-en-lost-job + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bertmodel_en.md b/docs/_posts/ahmedlone127/2024-09-25-bertmodel_en.md new file mode 100644 index 00000000000000..403e2ae1fc14db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bertmodel_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English bertmodel BertForTokenClassification from sigaldanilov +author: John Snow Labs +name: bertmodel +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertmodel` is a English model originally trained by sigaldanilov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertmodel_en_5.5.0_3.0_1727246283275.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertmodel_en_5.5.0_3.0_1727246283275.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("bertmodel","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("bertmodel", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertmodel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/sigaldanilov/bertmodel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-bertmodel_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-bertmodel_pipeline_en.md new file mode 100644 index 00000000000000..0379e305d9d912 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-bertmodel_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English bertmodel_pipeline pipeline BertForTokenClassification from sigaldanilov +author: John Snow Labs +name: bertmodel_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`bertmodel_pipeline` is a English model originally trained by sigaldanilov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/bertmodel_pipeline_en_5.5.0_3.0_1727246305633.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/bertmodel_pipeline_en_5.5.0_3.0_1727246305633.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("bertmodel_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("bertmodel_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|bertmodel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/sigaldanilov/bertmodel + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-best_model_sst_2_16_21_en.md b/docs/_posts/ahmedlone127/2024-09-25-best_model_sst_2_16_21_en.md new file mode 100644 index 00000000000000..6bfc463485d5cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-best_model_sst_2_16_21_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English best_model_sst_2_16_21 BertForSequenceClassification from simonycl +author: John Snow Labs +name: best_model_sst_2_16_21 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`best_model_sst_2_16_21` is a English model originally trained by simonycl. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/best_model_sst_2_16_21_en_5.5.0_3.0_1727266995876.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/best_model_sst_2_16_21_en_5.5.0_3.0_1727266995876.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("best_model_sst_2_16_21","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("best_model_sst_2_16_21", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|best_model_sst_2_16_21| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/simonycl/best_model-sst-2-16-21 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-beto_prescripciones_medicas_es.md b/docs/_posts/ahmedlone127/2024-09-25-beto_prescripciones_medicas_es.md new file mode 100644 index 00000000000000..f8de637f8e1b8a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-beto_prescripciones_medicas_es.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Castilian, Spanish beto_prescripciones_medicas BertForTokenClassification from ccarvajal +author: John Snow Labs +name: beto_prescripciones_medicas +date: 2024-09-25 +tags: [es, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`beto_prescripciones_medicas` is a Castilian, Spanish model originally trained by ccarvajal. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/beto_prescripciones_medicas_es_5.5.0_3.0_1727271089147.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/beto_prescripciones_medicas_es_5.5.0_3.0_1727271089147.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("beto_prescripciones_medicas","es") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("beto_prescripciones_medicas", "es") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|beto_prescripciones_medicas| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|es| +|Size:|409.5 MB| + +## References + +https://huggingface.co/ccarvajal/beto-prescripciones-medicas \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-beto_sentiment_analysis_finetuned_onpremise_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-beto_sentiment_analysis_finetuned_onpremise_pipeline_en.md new file mode 100644 index 00000000000000..f2d6f9c7b9e76a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-beto_sentiment_analysis_finetuned_onpremise_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English beto_sentiment_analysis_finetuned_onpremise_pipeline pipeline BertForSequenceClassification from Cristian-dcg +author: John Snow Labs +name: beto_sentiment_analysis_finetuned_onpremise_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`beto_sentiment_analysis_finetuned_onpremise_pipeline` is a English model originally trained by Cristian-dcg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/beto_sentiment_analysis_finetuned_onpremise_pipeline_en_5.5.0_3.0_1727263946307.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/beto_sentiment_analysis_finetuned_onpremise_pipeline_en_5.5.0_3.0_1727263946307.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("beto_sentiment_analysis_finetuned_onpremise_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("beto_sentiment_analysis_finetuned_onpremise_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|beto_sentiment_analysis_finetuned_onpremise_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.7 MB| + +## References + +https://huggingface.co/Cristian-dcg/beto-sentiment-analysis-finetuned-onpremise + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-biobert_huner_cell_v1_en.md b/docs/_posts/ahmedlone127/2024-09-25-biobert_huner_cell_v1_en.md new file mode 100644 index 00000000000000..5f2fbb14de4c14 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-biobert_huner_cell_v1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English biobert_huner_cell_v1 BertForTokenClassification from aitslab +author: John Snow Labs +name: biobert_huner_cell_v1 +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biobert_huner_cell_v1` is a English model originally trained by aitslab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biobert_huner_cell_v1_en_5.5.0_3.0_1727246222103.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biobert_huner_cell_v1_en_5.5.0_3.0_1727246222103.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("biobert_huner_cell_v1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("biobert_huner_cell_v1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biobert_huner_cell_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.1 MB| + +## References + +https://huggingface.co/aitslab/biobert_huner_cell_v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-biobert_huner_disease_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-biobert_huner_disease_v1_pipeline_en.md new file mode 100644 index 00000000000000..00ad252e2f275f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-biobert_huner_disease_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English biobert_huner_disease_v1_pipeline pipeline BertForTokenClassification from aitslab +author: John Snow Labs +name: biobert_huner_disease_v1_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biobert_huner_disease_v1_pipeline` is a English model originally trained by aitslab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biobert_huner_disease_v1_pipeline_en_5.5.0_3.0_1727280441019.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biobert_huner_disease_v1_pipeline_en_5.5.0_3.0_1727280441019.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("biobert_huner_disease_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("biobert_huner_disease_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biobert_huner_disease_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.1 MB| + +## References + +https://huggingface.co/aitslab/biobert_huner_disease_v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-biobit_drugtemist_italian_ner_en.md b/docs/_posts/ahmedlone127/2024-09-25-biobit_drugtemist_italian_ner_en.md new file mode 100644 index 00000000000000..7f3082ae818a42 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-biobit_drugtemist_italian_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English biobit_drugtemist_italian_ner BertForTokenClassification from Rodrigo1771 +author: John Snow Labs +name: biobit_drugtemist_italian_ner +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biobit_drugtemist_italian_ner` is a English model originally trained by Rodrigo1771. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biobit_drugtemist_italian_ner_en_5.5.0_3.0_1727258955213.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biobit_drugtemist_italian_ner_en_5.5.0_3.0_1727258955213.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("biobit_drugtemist_italian_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("biobit_drugtemist_italian_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biobit_drugtemist_italian_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|409.2 MB| + +## References + +https://huggingface.co/Rodrigo1771/bioBIT-drugtemist-it-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-biomednlp_pubmedbert_proteinstructure_ner_v1_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-biomednlp_pubmedbert_proteinstructure_ner_v1_2_pipeline_en.md new file mode 100644 index 00000000000000..05851fb62635d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-biomednlp_pubmedbert_proteinstructure_ner_v1_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English biomednlp_pubmedbert_proteinstructure_ner_v1_2_pipeline pipeline BertForTokenClassification from PDBEurope +author: John Snow Labs +name: biomednlp_pubmedbert_proteinstructure_ner_v1_2_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`biomednlp_pubmedbert_proteinstructure_ner_v1_2_pipeline` is a English model originally trained by PDBEurope. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/biomednlp_pubmedbert_proteinstructure_ner_v1_2_pipeline_en_5.5.0_3.0_1727275559505.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/biomednlp_pubmedbert_proteinstructure_ner_v1_2_pipeline_en_5.5.0_3.0_1727275559505.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("biomednlp_pubmedbert_proteinstructure_ner_v1_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("biomednlp_pubmedbert_proteinstructure_ner_v1_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|biomednlp_pubmedbert_proteinstructure_ner_v1_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.3 MB| + +## References + +https://huggingface.co/PDBEurope/BiomedNLP-PubMedBERT-ProteinStructure-NER-v1.2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-boss_toxicity_24000_bert_base_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-25-boss_toxicity_24000_bert_base_uncased_en.md new file mode 100644 index 00000000000000..67a001d658d96a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-boss_toxicity_24000_bert_base_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English boss_toxicity_24000_bert_base_uncased BertForSequenceClassification from Kyle1668 +author: John Snow Labs +name: boss_toxicity_24000_bert_base_uncased +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`boss_toxicity_24000_bert_base_uncased` is a English model originally trained by Kyle1668. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/boss_toxicity_24000_bert_base_uncased_en_5.5.0_3.0_1727264076701.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/boss_toxicity_24000_bert_base_uncased_en_5.5.0_3.0_1727264076701.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("boss_toxicity_24000_bert_base_uncased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("boss_toxicity_24000_bert_base_uncased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|boss_toxicity_24000_bert_base_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Kyle1668/boss-toxicity-24000-bert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-burmese_awesome_pubmed_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-burmese_awesome_pubmed_bert_pipeline_en.md new file mode 100644 index 00000000000000..9b08f9f5cf98f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-burmese_awesome_pubmed_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_pubmed_bert_pipeline pipeline BertForTokenClassification from arunavsk1 +author: John Snow Labs +name: burmese_awesome_pubmed_bert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_pubmed_bert_pipeline` is a English model originally trained by arunavsk1. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_pubmed_bert_pipeline_en_5.5.0_3.0_1727247467774.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_pubmed_bert_pipeline_en_5.5.0_3.0_1727247467774.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_pubmed_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_pubmed_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_pubmed_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/arunavsk1/my-awesome-pubmed-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-burmese_awesome_wnut_model_niharikavats2397_en.md b/docs/_posts/ahmedlone127/2024-09-25-burmese_awesome_wnut_model_niharikavats2397_en.md new file mode 100644 index 00000000000000..2a1aa22aaa9985 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-burmese_awesome_wnut_model_niharikavats2397_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_niharikavats2397 BertForTokenClassification from niharikavats2397 +author: John Snow Labs +name: burmese_awesome_wnut_model_niharikavats2397 +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_niharikavats2397` is a English model originally trained by niharikavats2397. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_niharikavats2397_en_5.5.0_3.0_1727246600387.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_niharikavats2397_en_5.5.0_3.0_1727246600387.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("burmese_awesome_wnut_model_niharikavats2397","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("burmese_awesome_wnut_model_niharikavats2397", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_niharikavats2397| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.3 MB| + +## References + +https://huggingface.co/niharikavats2397/my_awesome_wnut_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-burmese_awesome_wnut_model_niharikavats2397_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-burmese_awesome_wnut_model_niharikavats2397_pipeline_en.md new file mode 100644 index 00000000000000..3929854427a80c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-burmese_awesome_wnut_model_niharikavats2397_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English burmese_awesome_wnut_model_niharikavats2397_pipeline pipeline BertForTokenClassification from niharikavats2397 +author: John Snow Labs +name: burmese_awesome_wnut_model_niharikavats2397_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`burmese_awesome_wnut_model_niharikavats2397_pipeline` is a English model originally trained by niharikavats2397. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_niharikavats2397_pipeline_en_5.5.0_3.0_1727246621542.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/burmese_awesome_wnut_model_niharikavats2397_pipeline_en_5.5.0_3.0_1727246621542.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("burmese_awesome_wnut_model_niharikavats2397_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("burmese_awesome_wnut_model_niharikavats2397_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|burmese_awesome_wnut_model_niharikavats2397_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.4 MB| + +## References + +https://huggingface.co/niharikavats2397/my_awesome_wnut_model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-camedbert_512_fl32_checkpoint_17386_de.md b/docs/_posts/ahmedlone127/2024-09-25-camedbert_512_fl32_checkpoint_17386_de.md new file mode 100644 index 00000000000000..90cb764113722c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-camedbert_512_fl32_checkpoint_17386_de.md @@ -0,0 +1,94 @@ +--- +layout: model +title: German camedbert_512_fl32_checkpoint_17386 BertForTokenClassification from MSey +author: John Snow Labs +name: camedbert_512_fl32_checkpoint_17386 +date: 2024-09-25 +tags: [de, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`camedbert_512_fl32_checkpoint_17386` is a German model originally trained by MSey. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/camedbert_512_fl32_checkpoint_17386_de_5.5.0_3.0_1727247124891.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/camedbert_512_fl32_checkpoint_17386_de_5.5.0_3.0_1727247124891.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("camedbert_512_fl32_checkpoint_17386","de") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("camedbert_512_fl32_checkpoint_17386", "de") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|camedbert_512_fl32_checkpoint_17386| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|de| +|Size:|406.9 MB| + +## References + +https://huggingface.co/MSey/CaMedBERT-512_fl32_checkpoint-17386 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-cares_bert_base_en.md b/docs/_posts/ahmedlone127/2024-09-25-cares_bert_base_en.md new file mode 100644 index 00000000000000..9bbf09f69c784a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-cares_bert_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cares_bert_base BertForSequenceClassification from chizhikchi +author: John Snow Labs +name: cares_bert_base +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cares_bert_base` is a English model originally trained by chizhikchi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cares_bert_base_en_5.5.0_3.0_1727285262658.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cares_bert_base_en_5.5.0_3.0_1727285262658.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("cares_bert_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("cares_bert_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cares_bert_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|411.7 MB| + +## References + +https://huggingface.co/chizhikchi/cares-bert-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-case_analysis_inlegalbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-case_analysis_inlegalbert_pipeline_en.md new file mode 100644 index 00000000000000..1922fc82ed74f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-case_analysis_inlegalbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English case_analysis_inlegalbert_pipeline pipeline BertForSequenceClassification from cite-text-analysis +author: John Snow Labs +name: case_analysis_inlegalbert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`case_analysis_inlegalbert_pipeline` is a English model originally trained by cite-text-analysis. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/case_analysis_inlegalbert_pipeline_en_5.5.0_3.0_1727263755028.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/case_analysis_inlegalbert_pipeline_en_5.5.0_3.0_1727263755028.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("case_analysis_inlegalbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("case_analysis_inlegalbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|case_analysis_inlegalbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/cite-text-analysis/case-analysis-InLegalBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-chilean_spanish_hate_speech_pipeline_es.md b/docs/_posts/ahmedlone127/2024-09-25-chilean_spanish_hate_speech_pipeline_es.md new file mode 100644 index 00000000000000..9252683571bbce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-chilean_spanish_hate_speech_pipeline_es.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Castilian, Spanish chilean_spanish_hate_speech_pipeline pipeline BertForSequenceClassification from jorgeortizfuentes +author: John Snow Labs +name: chilean_spanish_hate_speech_pipeline +date: 2024-09-25 +tags: [es, open_source, pipeline, onnx] +task: Text Classification +language: es +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`chilean_spanish_hate_speech_pipeline` is a Castilian, Spanish model originally trained by jorgeortizfuentes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/chilean_spanish_hate_speech_pipeline_es_5.5.0_3.0_1727245880418.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/chilean_spanish_hate_speech_pipeline_es_5.5.0_3.0_1727245880418.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("chilean_spanish_hate_speech_pipeline", lang = "es") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("chilean_spanish_hate_speech_pipeline", lang = "es") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|chilean_spanish_hate_speech_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|es| +|Size:|411.6 MB| + +## References + +https://huggingface.co/jorgeortizfuentes/chilean-spanish-hate-speech + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ckiplab_albert_base_chinese_david_ner_en.md b/docs/_posts/ahmedlone127/2024-09-25-ckiplab_albert_base_chinese_david_ner_en.md new file mode 100644 index 00000000000000..bbbb36198ca9e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ckiplab_albert_base_chinese_david_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ckiplab_albert_base_chinese_david_ner BertForTokenClassification from davidliu1110 +author: John Snow Labs +name: ckiplab_albert_base_chinese_david_ner +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ckiplab_albert_base_chinese_david_ner` is a English model originally trained by davidliu1110. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ckiplab_albert_base_chinese_david_ner_en_5.5.0_3.0_1727249669813.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ckiplab_albert_base_chinese_david_ner_en_5.5.0_3.0_1727249669813.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("ckiplab_albert_base_chinese_david_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("ckiplab_albert_base_chinese_david_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ckiplab_albert_base_chinese_david_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|37.6 MB| + +## References + +https://huggingface.co/davidliu1110/ckiplab-albert-base-chinese-david-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ckiplab_albert_base_chinese_david_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-ckiplab_albert_base_chinese_david_ner_pipeline_en.md new file mode 100644 index 00000000000000..05fc000044d35e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ckiplab_albert_base_chinese_david_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ckiplab_albert_base_chinese_david_ner_pipeline pipeline BertForTokenClassification from davidliu1110 +author: John Snow Labs +name: ckiplab_albert_base_chinese_david_ner_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ckiplab_albert_base_chinese_david_ner_pipeline` is a English model originally trained by davidliu1110. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ckiplab_albert_base_chinese_david_ner_pipeline_en_5.5.0_3.0_1727249671995.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ckiplab_albert_base_chinese_david_ner_pipeline_en_5.5.0_3.0_1727249671995.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ckiplab_albert_base_chinese_david_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ckiplab_albert_base_chinese_david_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ckiplab_albert_base_chinese_david_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|37.6 MB| + +## References + +https://huggingface.co/davidliu1110/ckiplab-albert-base-chinese-david-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-clasificador_poem_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-clasificador_poem_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..f2ac974be0c5b0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-clasificador_poem_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English clasificador_poem_sentiment_pipeline pipeline BertForSequenceClassification from joheras +author: John Snow Labs +name: clasificador_poem_sentiment_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clasificador_poem_sentiment_pipeline` is a English model originally trained by joheras. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clasificador_poem_sentiment_pipeline_en_5.5.0_3.0_1727276929740.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clasificador_poem_sentiment_pipeline_en_5.5.0_3.0_1727276929740.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("clasificador_poem_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("clasificador_poem_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clasificador_poem_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/joheras/clasificador-poem-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-classicalchineseletterclassification_pipeline_zh.md b/docs/_posts/ahmedlone127/2024-09-25-classicalchineseletterclassification_pipeline_zh.md new file mode 100644 index 00000000000000..c79141b9f062bd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-classicalchineseletterclassification_pipeline_zh.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Chinese classicalchineseletterclassification_pipeline pipeline BertForSequenceClassification from cbdb +author: John Snow Labs +name: classicalchineseletterclassification_pipeline +date: 2024-09-25 +tags: [zh, open_source, pipeline, onnx] +task: Text Classification +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classicalchineseletterclassification_pipeline` is a Chinese model originally trained by cbdb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classicalchineseletterclassification_pipeline_zh_5.5.0_3.0_1727267123675.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classicalchineseletterclassification_pipeline_zh_5.5.0_3.0_1727267123675.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classicalchineseletterclassification_pipeline", lang = "zh") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classicalchineseletterclassification_pipeline", lang = "zh") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classicalchineseletterclassification_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|zh| +|Size:|383.3 MB| + +## References + +https://huggingface.co/cbdb/ClassicalChineseLetterClassification + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-classicalchineseletterclassification_zh.md b/docs/_posts/ahmedlone127/2024-09-25-classicalchineseletterclassification_zh.md new file mode 100644 index 00000000000000..a582014c71ffcd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-classicalchineseletterclassification_zh.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Chinese classicalchineseletterclassification BertForSequenceClassification from cbdb +author: John Snow Labs +name: classicalchineseletterclassification +date: 2024-09-25 +tags: [zh, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: zh +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classicalchineseletterclassification` is a Chinese model originally trained by cbdb. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classicalchineseletterclassification_zh_5.5.0_3.0_1727267102616.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classicalchineseletterclassification_zh_5.5.0_3.0_1727267102616.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("classicalchineseletterclassification","zh") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("classicalchineseletterclassification", "zh") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classicalchineseletterclassification| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|zh| +|Size:|383.3 MB| + +## References + +https://huggingface.co/cbdb/ClassicalChineseLetterClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-classifier_theojolliffe_en.md b/docs/_posts/ahmedlone127/2024-09-25-classifier_theojolliffe_en.md new file mode 100644 index 00000000000000..1aebe4faee0112 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-classifier_theojolliffe_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English classifier_theojolliffe BertForSequenceClassification from theojolliffe +author: John Snow Labs +name: classifier_theojolliffe +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classifier_theojolliffe` is a English model originally trained by theojolliffe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classifier_theojolliffe_en_5.5.0_3.0_1727266697558.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classifier_theojolliffe_en_5.5.0_3.0_1727266697558.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("classifier_theojolliffe","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("classifier_theojolliffe", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classifier_theojolliffe| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|415.8 MB| + +## References + +https://huggingface.co/theojolliffe/classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-classifier_theojolliffe_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-classifier_theojolliffe_pipeline_en.md new file mode 100644 index 00000000000000..bcc635aac4bb00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-classifier_theojolliffe_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English classifier_theojolliffe_pipeline pipeline BertForSequenceClassification from theojolliffe +author: John Snow Labs +name: classifier_theojolliffe_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`classifier_theojolliffe_pipeline` is a English model originally trained by theojolliffe. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/classifier_theojolliffe_pipeline_en_5.5.0_3.0_1727266719676.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/classifier_theojolliffe_pipeline_en_5.5.0_3.0_1727266719676.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("classifier_theojolliffe_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("classifier_theojolliffe_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|classifier_theojolliffe_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|415.8 MB| + +## References + +https://huggingface.co/theojolliffe/classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-clinicalbert_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-25-clinicalbert_finetuned_en.md new file mode 100644 index 00000000000000..c02c9d611d0f00 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-clinicalbert_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English clinicalbert_finetuned BertForSequenceClassification from SrinivasaPragada +author: John Snow Labs +name: clinicalbert_finetuned +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`clinicalbert_finetuned` is a English model originally trained by SrinivasaPragada. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/clinicalbert_finetuned_en_5.5.0_3.0_1727254489266.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/clinicalbert_finetuned_en_5.5.0_3.0_1727254489266.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("clinicalbert_finetuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("clinicalbert_finetuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|clinicalbert_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.6 MB| + +## References + +https://huggingface.co/SrinivasaPragada/clinicalbert-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-cnc_v2_st1_csc_en.md b/docs/_posts/ahmedlone127/2024-09-25-cnc_v2_st1_csc_en.md new file mode 100644 index 00000000000000..e6ae349a4272b1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-cnc_v2_st1_csc_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cnc_v2_st1_csc BertForSequenceClassification from tanfiona +author: John Snow Labs +name: cnc_v2_st1_csc +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cnc_v2_st1_csc` is a English model originally trained by tanfiona. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cnc_v2_st1_csc_en_5.5.0_3.0_1727269584466.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cnc_v2_st1_csc_en_5.5.0_3.0_1727269584466.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("cnc_v2_st1_csc","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("cnc_v2_st1_csc", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cnc_v2_st1_csc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/tanfiona/cnc-v2-st1-csc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-cnc_v2_st1_csc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-cnc_v2_st1_csc_pipeline_en.md new file mode 100644 index 00000000000000..29cc011c2c3ed5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-cnc_v2_st1_csc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cnc_v2_st1_csc_pipeline pipeline BertForSequenceClassification from tanfiona +author: John Snow Labs +name: cnc_v2_st1_csc_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cnc_v2_st1_csc_pipeline` is a English model originally trained by tanfiona. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cnc_v2_st1_csc_pipeline_en_5.5.0_3.0_1727269605981.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cnc_v2_st1_csc_pipeline_en_5.5.0_3.0_1727269605981.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cnc_v2_st1_csc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cnc_v2_st1_csc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cnc_v2_st1_csc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/tanfiona/cnc-v2-st1-csc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-conjunction_classification_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-25-conjunction_classification_finetuned_en.md new file mode 100644 index 00000000000000..847861e24ace54 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-conjunction_classification_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English conjunction_classification_finetuned BertForSequenceClassification from nhanpv +author: John Snow Labs +name: conjunction_classification_finetuned +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`conjunction_classification_finetuned` is a English model originally trained by nhanpv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/conjunction_classification_finetuned_en_5.5.0_3.0_1727288691878.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/conjunction_classification_finetuned_en_5.5.0_3.0_1727288691878.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("conjunction_classification_finetuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("conjunction_classification_finetuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|conjunction_classification_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/nhanpv/conjunction-classification-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-conjunction_classification_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-conjunction_classification_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..ebc661c1740428 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-conjunction_classification_finetuned_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English conjunction_classification_finetuned_pipeline pipeline BertForSequenceClassification from nhanpv +author: John Snow Labs +name: conjunction_classification_finetuned_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`conjunction_classification_finetuned_pipeline` is a English model originally trained by nhanpv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/conjunction_classification_finetuned_pipeline_en_5.5.0_3.0_1727288718154.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/conjunction_classification_finetuned_pipeline_en_5.5.0_3.0_1727288718154.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("conjunction_classification_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("conjunction_classification_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|conjunction_classification_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/nhanpv/conjunction-classification-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-consumer_complaint_categorization_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-consumer_complaint_categorization_pipeline_en.md new file mode 100644 index 00000000000000..4db5499dfc12a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-consumer_complaint_categorization_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English consumer_complaint_categorization_pipeline pipeline BertForSequenceClassification from ThirdEyeData +author: John Snow Labs +name: consumer_complaint_categorization_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`consumer_complaint_categorization_pipeline` is a English model originally trained by ThirdEyeData. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/consumer_complaint_categorization_pipeline_en_5.5.0_3.0_1727245845783.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/consumer_complaint_categorization_pipeline_en_5.5.0_3.0_1727245845783.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("consumer_complaint_categorization_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("consumer_complaint_categorization_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|consumer_complaint_categorization_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ThirdEyeData/Consumer-Complaint-Categorization + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_en.md b/docs/_posts/ahmedlone127/2024-09-25-correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_en.md new file mode 100644 index 00000000000000..d3209048a792f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47 BertForTokenClassification from ali2066 +author: John Snow Labs +name: correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47 +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_en_5.5.0_3.0_1727270650498.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_en_5.5.0_3.0_1727270650498.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ali2066/correct_BERT_token_itr0_0.0001_essays_01_03_2022-15_48_47 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_pipeline_en.md new file mode 100644 index 00000000000000..a068e7fde012a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_pipeline pipeline BertForTokenClassification from ali2066 +author: John Snow Labs +name: correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_pipeline` is a English model originally trained by ali2066. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_pipeline_en_5.5.0_3.0_1727270671729.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_pipeline_en_5.5.0_3.0_1727270671729.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|correct_bert_token_itr0_0_0001_essays_01_03_2022_15_48_47_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ali2066/correct_BERT_token_itr0_0.0001_essays_01_03_2022-15_48_47 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-crypto_sentiment_analysis_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-crypto_sentiment_analysis_bert_pipeline_en.md new file mode 100644 index 00000000000000..53efac5f0b324c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-crypto_sentiment_analysis_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English crypto_sentiment_analysis_bert_pipeline pipeline BertForSequenceClassification from Robertuus +author: John Snow Labs +name: crypto_sentiment_analysis_bert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`crypto_sentiment_analysis_bert_pipeline` is a English model originally trained by Robertuus. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/crypto_sentiment_analysis_bert_pipeline_en_5.5.0_3.0_1727285070915.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/crypto_sentiment_analysis_bert_pipeline_en_5.5.0_3.0_1727285070915.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("crypto_sentiment_analysis_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("crypto_sentiment_analysis_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|crypto_sentiment_analysis_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Robertuus/Crypto_Sentiment_Analysis_Bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-crypto_sentiment_en.md b/docs/_posts/ahmedlone127/2024-09-25-crypto_sentiment_en.md new file mode 100644 index 00000000000000..5f3f66a2a1c48c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-crypto_sentiment_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English crypto_sentiment BertForSequenceClassification from ckandemir +author: John Snow Labs +name: crypto_sentiment +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`crypto_sentiment` is a English model originally trained by ckandemir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/crypto_sentiment_en_5.5.0_3.0_1727268448023.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/crypto_sentiment_en_5.5.0_3.0_1727268448023.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("crypto_sentiment","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("crypto_sentiment", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|crypto_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ckandemir/crypto_sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-crypto_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-crypto_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..7efc59467d69ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-crypto_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English crypto_sentiment_pipeline pipeline BertForSequenceClassification from ckandemir +author: John Snow Labs +name: crypto_sentiment_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`crypto_sentiment_pipeline` is a English model originally trained by ckandemir. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/crypto_sentiment_pipeline_en_5.5.0_3.0_1727268470814.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/crypto_sentiment_pipeline_en_5.5.0_3.0_1727268470814.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("crypto_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("crypto_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|crypto_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ckandemir/crypto_sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-cvai_bert_asag_en.md b/docs/_posts/ahmedlone127/2024-09-25-cvai_bert_asag_en.md new file mode 100644 index 00000000000000..cfc47fff6376a7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-cvai_bert_asag_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English cvai_bert_asag BertForSequenceClassification from johnpaulbin +author: John Snow Labs +name: cvai_bert_asag +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cvai_bert_asag` is a English model originally trained by johnpaulbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cvai_bert_asag_en_5.5.0_3.0_1727286207987.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cvai_bert_asag_en_5.5.0_3.0_1727286207987.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("cvai_bert_asag","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("cvai_bert_asag", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cvai_bert_asag| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/johnpaulbin/cvai-bert-asag \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-cvai_bert_asag_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-cvai_bert_asag_pipeline_en.md new file mode 100644 index 00000000000000..bc604c5c661f39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-cvai_bert_asag_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English cvai_bert_asag_pipeline pipeline BertForSequenceClassification from johnpaulbin +author: John Snow Labs +name: cvai_bert_asag_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`cvai_bert_asag_pipeline` is a English model originally trained by johnpaulbin. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/cvai_bert_asag_pipeline_en_5.5.0_3.0_1727286229187.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/cvai_bert_asag_pipeline_en_5.5.0_3.0_1727286229187.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("cvai_bert_asag_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("cvai_bert_asag_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|cvai_bert_asag_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/johnpaulbin/cvai-bert-asag + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-danish_bert_en.md b/docs/_posts/ahmedlone127/2024-09-25-danish_bert_en.md new file mode 100644 index 00000000000000..63176340962c83 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-danish_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English danish_bert BertEmbeddings from iolariu +author: John Snow Labs +name: danish_bert +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`danish_bert` is a English model originally trained by iolariu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/danish_bert_en_5.5.0_3.0_1727232586705.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/danish_bert_en_5.5.0_3.0_1727232586705.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("danish_bert","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("danish_bert","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|danish_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/iolariu/DA_BERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-dbpedia_classes_bert_base_uncased_few_20_en.md b/docs/_posts/ahmedlone127/2024-09-25-dbpedia_classes_bert_base_uncased_few_20_en.md new file mode 100644 index 00000000000000..21acbd6c8e0b04 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-dbpedia_classes_bert_base_uncased_few_20_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dbpedia_classes_bert_base_uncased_few_20 BertForSequenceClassification from TheChickenAgent +author: John Snow Labs +name: dbpedia_classes_bert_base_uncased_few_20 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dbpedia_classes_bert_base_uncased_few_20` is a English model originally trained by TheChickenAgent. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dbpedia_classes_bert_base_uncased_few_20_en_5.5.0_3.0_1727286884203.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dbpedia_classes_bert_base_uncased_few_20_en_5.5.0_3.0_1727286884203.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("dbpedia_classes_bert_base_uncased_few_20","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("dbpedia_classes_bert_base_uncased_few_20", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dbpedia_classes_bert_base_uncased_few_20| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/TheChickenAgent/DBPedia_Classes_BERT-base-uncased-few-20 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-dbpedia_classes_bert_base_uncased_few_20_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-dbpedia_classes_bert_base_uncased_few_20_pipeline_en.md new file mode 100644 index 00000000000000..d167e65599cad0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-dbpedia_classes_bert_base_uncased_few_20_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dbpedia_classes_bert_base_uncased_few_20_pipeline pipeline BertForSequenceClassification from TheChickenAgent +author: John Snow Labs +name: dbpedia_classes_bert_base_uncased_few_20_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dbpedia_classes_bert_base_uncased_few_20_pipeline` is a English model originally trained by TheChickenAgent. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dbpedia_classes_bert_base_uncased_few_20_pipeline_en_5.5.0_3.0_1727286905887.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dbpedia_classes_bert_base_uncased_few_20_pipeline_en_5.5.0_3.0_1727286905887.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dbpedia_classes_bert_base_uncased_few_20_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dbpedia_classes_bert_base_uncased_few_20_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dbpedia_classes_bert_base_uncased_few_20_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/TheChickenAgent/DBPedia_Classes_BERT-base-uncased-few-20 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-decision_bert_bio_en.md b/docs/_posts/ahmedlone127/2024-09-25-decision_bert_bio_en.md new file mode 100644 index 00000000000000..0dc7ac2fa1add4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-decision_bert_bio_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English decision_bert_bio BertForSequenceClassification from k-partha +author: John Snow Labs +name: decision_bert_bio +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`decision_bert_bio` is a English model originally trained by k-partha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/decision_bert_bio_en_5.5.0_3.0_1727273102482.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/decision_bert_bio_en_5.5.0_3.0_1727273102482.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("decision_bert_bio","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("decision_bert_bio", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|decision_bert_bio| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/k-partha/decision_bert_bio \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-decision_bert_bio_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-decision_bert_bio_pipeline_en.md new file mode 100644 index 00000000000000..e74c2559d67dba --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-decision_bert_bio_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English decision_bert_bio_pipeline pipeline BertForSequenceClassification from k-partha +author: John Snow Labs +name: decision_bert_bio_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`decision_bert_bio_pipeline` is a English model originally trained by k-partha. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/decision_bert_bio_pipeline_en_5.5.0_3.0_1727273137493.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/decision_bert_bio_pipeline_en_5.5.0_3.0_1727273137493.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("decision_bert_bio_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("decision_bert_bio_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|decision_bert_bio_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/k-partha/decision_bert_bio + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-destractive_context_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-destractive_context_pipeline_en.md new file mode 100644 index 00000000000000..63a93aadc03014 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-destractive_context_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English destractive_context_pipeline pipeline BertForSequenceClassification from Vlad1m +author: John Snow Labs +name: destractive_context_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`destractive_context_pipeline` is a English model originally trained by Vlad1m. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/destractive_context_pipeline_en_5.5.0_3.0_1727261275422.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/destractive_context_pipeline_en_5.5.0_3.0_1727261275422.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("destractive_context_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("destractive_context_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|destractive_context_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.6 GB| + +## References + +https://huggingface.co/Vlad1m/destractive_context + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-dialect_msa_detection_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-dialect_msa_detection_pipeline_en.md new file mode 100644 index 00000000000000..bbed8ef2e34011 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-dialect_msa_detection_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English dialect_msa_detection_pipeline pipeline XlmRoBertaForSequenceClassification from sadanyh +author: John Snow Labs +name: dialect_msa_detection_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dialect_msa_detection_pipeline` is a English model originally trained by sadanyh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dialect_msa_detection_pipeline_en_5.5.0_3.0_1727229668451.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dialect_msa_detection_pipeline_en_5.5.0_3.0_1727229668451.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("dialect_msa_detection_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("dialect_msa_detection_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dialect_msa_detection_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|782.4 MB| + +## References + +https://huggingface.co/sadanyh/Dialect-MSA-detection + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-dialogue_final_model_en.md b/docs/_posts/ahmedlone127/2024-09-25-dialogue_final_model_en.md new file mode 100644 index 00000000000000..fcec274221a7cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-dialogue_final_model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English dialogue_final_model BertForSequenceClassification from SharonTudi +author: John Snow Labs +name: dialogue_final_model +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`dialogue_final_model` is a English model originally trained by SharonTudi. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/dialogue_final_model_en_5.5.0_3.0_1727288753786.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/dialogue_final_model_en_5.5.0_3.0_1727288753786.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("dialogue_final_model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("dialogue_final_model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|dialogue_final_model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/SharonTudi/DIALOGUE_final_model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-distilbert_base_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-distilbert_base_cased_en.md new file mode 100644 index 00000000000000..88e6c4ebf0be21 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-distilbert_base_cased_en.md @@ -0,0 +1,96 @@ +--- +layout: model +title: DistilBERT base model (cased) +author: John Snow Labs +name: distilbert_base_cased +date: 2024-09-25 +tags: [distilbert, en, english, open_source, embeddings, onnx, openvino] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: openvino +annotator: DistilBertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This model is a distilled version of the [BERT base model](https://huggingface.co/bert-base-cased). It was introduced in [this paper](https://arxiv.org/abs/1910.01108). The code for the distillation process can be found [here](https://github.com/huggingface/transformers/tree/master/examples/research_projects/distillation). This model is cased: it does make a difference between english and English. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_en_5.5.0_3.0_1727268763405.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_cased_en_5.5.0_3.0_1727268763405.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + +{:.model-param} + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +embeddings = DistilBertEmbeddings.pretrained("distilbert_base_cased", "en") \ +.setInputCols("sentence", "token") \ +.setOutputCol("embeddings") +nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, tokenizer, embeddings]) +``` +```scala +val embeddings = DistilBertEmbeddings.pretrained("distilbert_base_cased", "en") +.setInputCols("sentence", "token") +.setOutputCol("embeddings") +val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, tokenizer, embeddings)) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.embed.distilbert").predict("""Put your text here.""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|243.6 MB| +|Case sensitive:|false| +|Max sentence length:|512| + +## References + +References + +[https://huggingface.co/distilbert-base-cased](https://huggingface.co/distilbert-base-cased) + +## Benchmarking + +```bash + +Benchmarking + + +When fine-tuned on downstream tasks, this model achieves the following results: + +Glue test results: + +| Task | MNLI | QQP | QNLI | SST-2 | CoLA | STS-B | MRPC | RTE | +|:----:|:----:|:----:|:----:|:-----:|:----:|:-----:|:----:|:----:| +| | 81.5 | 87.8 | 88.2 | 90.4 | 47.2 | 85.5 | 85.6 | 60.6 | +``` \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-distilbert_base_uncased_accelerate_en.md b/docs/_posts/ahmedlone127/2024-09-25-distilbert_base_uncased_accelerate_en.md new file mode 100644 index 00000000000000..d887395f391ff2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-distilbert_base_uncased_accelerate_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_base_uncased_accelerate BertForTokenClassification from NSandra +author: John Snow Labs +name: distilbert_base_uncased_accelerate +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_base_uncased_accelerate` is a English model originally trained by NSandra. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_accelerate_en_5.5.0_3.0_1727283620759.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_base_uncased_accelerate_en_5.5.0_3.0_1727283620759.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("distilbert_base_uncased_accelerate","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("distilbert_base_uncased_accelerate", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_base_uncased_accelerate| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/NSandra/distilbert-base-uncased-accelerate \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-distilbert_emotion_yenicerisgk_en.md b/docs/_posts/ahmedlone127/2024-09-25-distilbert_emotion_yenicerisgk_en.md new file mode 100644 index 00000000000000..b6bfbacbce2401 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-distilbert_emotion_yenicerisgk_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_emotion_yenicerisgk BertForSequenceClassification from yeniceriSGK +author: John Snow Labs +name: distilbert_emotion_yenicerisgk +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_yenicerisgk` is a English model originally trained by yeniceriSGK. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_yenicerisgk_en_5.5.0_3.0_1727237438555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_yenicerisgk_en_5.5.0_3.0_1727237438555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("distilbert_emotion_yenicerisgk","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("distilbert_emotion_yenicerisgk", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_yenicerisgk| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yeniceriSGK/distilbert-emotion \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-distilbert_emotion_yenicerisgk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-distilbert_emotion_yenicerisgk_pipeline_en.md new file mode 100644 index 00000000000000..ee22b1dc3ca919 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-distilbert_emotion_yenicerisgk_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_emotion_yenicerisgk_pipeline pipeline BertForSequenceClassification from yeniceriSGK +author: John Snow Labs +name: distilbert_emotion_yenicerisgk_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_emotion_yenicerisgk_pipeline` is a English model originally trained by yeniceriSGK. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_emotion_yenicerisgk_pipeline_en_5.5.0_3.0_1727237459844.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_emotion_yenicerisgk_pipeline_en_5.5.0_3.0_1727237459844.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_emotion_yenicerisgk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_emotion_yenicerisgk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_emotion_yenicerisgk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/yeniceriSGK/distilbert-emotion + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-distilbert_finetuned_imdb_neural_net_rahul_en.md b/docs/_posts/ahmedlone127/2024-09-25-distilbert_finetuned_imdb_neural_net_rahul_en.md new file mode 100644 index 00000000000000..f043707d37e842 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-distilbert_finetuned_imdb_neural_net_rahul_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distilbert_finetuned_imdb_neural_net_rahul BertEmbeddings from neural-net-rahul +author: John Snow Labs +name: distilbert_finetuned_imdb_neural_net_rahul +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_imdb_neural_net_rahul` is a English model originally trained by neural-net-rahul. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_neural_net_rahul_en_5.5.0_3.0_1727231514095.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_neural_net_rahul_en_5.5.0_3.0_1727231514095.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("distilbert_finetuned_imdb_neural_net_rahul","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("distilbert_finetuned_imdb_neural_net_rahul","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_imdb_neural_net_rahul| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/neural-net-rahul/distilbert-finetuned-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-distilbert_finetuned_imdb_neural_net_rahul_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-distilbert_finetuned_imdb_neural_net_rahul_pipeline_en.md new file mode 100644 index 00000000000000..fe06b8f1f3fb92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-distilbert_finetuned_imdb_neural_net_rahul_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_finetuned_imdb_neural_net_rahul_pipeline pipeline BertEmbeddings from neural-net-rahul +author: John Snow Labs +name: distilbert_finetuned_imdb_neural_net_rahul_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_finetuned_imdb_neural_net_rahul_pipeline` is a English model originally trained by neural-net-rahul. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_neural_net_rahul_pipeline_en_5.5.0_3.0_1727231535128.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_finetuned_imdb_neural_net_rahul_pipeline_en_5.5.0_3.0_1727231535128.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_finetuned_imdb_neural_net_rahul_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_finetuned_imdb_neural_net_rahul_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_finetuned_imdb_neural_net_rahul_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/neural-net-rahul/distilbert-finetuned-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-distilbert_portuguese_cased_finetuned_quantity_en.md b/docs/_posts/ahmedlone127/2024-09-25-distilbert_portuguese_cased_finetuned_quantity_en.md new file mode 100644 index 00000000000000..7a960caae1d799 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-distilbert_portuguese_cased_finetuned_quantity_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English distilbert_portuguese_cased_finetuned_quantity BertForSequenceClassification from alexia20816 +author: John Snow Labs +name: distilbert_portuguese_cased_finetuned_quantity +date: 2024-09-25 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_portuguese_cased_finetuned_quantity` is a English model originally trained by alexia20816. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_portuguese_cased_finetuned_quantity_en_5.5.0_3.0_1727235868972.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_portuguese_cased_finetuned_quantity_en_5.5.0_3.0_1727235868972.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = BertForSequenceClassification.pretrained("distilbert_portuguese_cased_finetuned_quantity","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("distilbert_portuguese_cased_finetuned_quantity","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_portuguese_cased_finetuned_quantity| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|279.9 MB| + +## References + +References + +https://huggingface.co/alexia20816/distilbert-portuguese-cased-finetuned-quantity \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-distilbert_portuguese_cased_finetuned_quantity_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-distilbert_portuguese_cased_finetuned_quantity_pipeline_en.md new file mode 100644 index 00000000000000..1794a824f0de41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-distilbert_portuguese_cased_finetuned_quantity_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distilbert_portuguese_cased_finetuned_quantity_pipeline pipeline BertForSequenceClassification from xc2450 +author: John Snow Labs +name: distilbert_portuguese_cased_finetuned_quantity_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distilbert_portuguese_cased_finetuned_quantity_pipeline` is a English model originally trained by xc2450. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distilbert_portuguese_cased_finetuned_quantity_pipeline_en_5.5.0_3.0_1727235884228.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distilbert_portuguese_cased_finetuned_quantity_pipeline_en_5.5.0_3.0_1727235884228.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distilbert_portuguese_cased_finetuned_quantity_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distilbert_portuguese_cased_finetuned_quantity_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distilbert_portuguese_cased_finetuned_quantity_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|279.9 MB| + +## References + +https://huggingface.co/xc2450/distilbert-portuguese-cased-finetuned-quantity + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-distillbert_distilled_ag_news_en.md b/docs/_posts/ahmedlone127/2024-09-25-distillbert_distilled_ag_news_en.md new file mode 100644 index 00000000000000..d25c7a97ce77cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-distillbert_distilled_ag_news_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English distillbert_distilled_ag_news BertForSequenceClassification from odunola +author: John Snow Labs +name: distillbert_distilled_ag_news +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distillbert_distilled_ag_news` is a English model originally trained by odunola. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distillbert_distilled_ag_news_en_5.5.0_3.0_1727264201881.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distillbert_distilled_ag_news_en_5.5.0_3.0_1727264201881.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("distillbert_distilled_ag_news","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("distillbert_distilled_ag_news", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distillbert_distilled_ag_news| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|65.8 MB| + +## References + +https://huggingface.co/odunola/distillbert-distilled-ag-news \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-distillbert_distilled_ag_news_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-distillbert_distilled_ag_news_pipeline_en.md new file mode 100644 index 00000000000000..0dbdd8a357bf1e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-distillbert_distilled_ag_news_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English distillbert_distilled_ag_news_pipeline pipeline BertForSequenceClassification from odunola +author: John Snow Labs +name: distillbert_distilled_ag_news_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`distillbert_distilled_ag_news_pipeline` is a English model originally trained by odunola. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/distillbert_distilled_ag_news_pipeline_en_5.5.0_3.0_1727264205159.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/distillbert_distilled_ag_news_pipeline_en_5.5.0_3.0_1727264205159.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("distillbert_distilled_ag_news_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("distillbert_distilled_ag_news_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|distillbert_distilled_ag_news_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|65.8 MB| + +## References + +https://huggingface.co/odunola/distillbert-distilled-ag-news + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-e5_large_mnli_en.md b/docs/_posts/ahmedlone127/2024-09-25-e5_large_mnli_en.md new file mode 100644 index 00000000000000..4ba192ba479c33 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-e5_large_mnli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English e5_large_mnli BertForZeroShotClassification from mjwong +author: John Snow Labs +name: e5_large_mnli +date: 2024-09-25 +tags: [en, open_source, onnx, zero_shot, bert] +task: Zero-Shot Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForZeroShotClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForZeroShotClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`e5_large_mnli` is a English model originally trained by mjwong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/e5_large_mnli_en_5.5.0_3.0_1727222972046.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/e5_large_mnli_en_5.5.0_3.0_1727222972046.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +zeroShotClassifier = BertForZeroShotClassification.pretrained("e5_large_mnli","en") \ + .setInputCols(["document","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, zeroShotClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val zeroShotClassifier = BertForZeroShotClassification.pretrained("e5_large_mnli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, zeroShotClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|e5_large_mnli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/mjwong/e5-large-mnli \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-e5_large_mnli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-e5_large_mnli_pipeline_en.md new file mode 100644 index 00000000000000..f6596e63e770dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-e5_large_mnli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English e5_large_mnli_pipeline pipeline BertForZeroShotClassification from mjwong +author: John Snow Labs +name: e5_large_mnli_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Zero-Shot Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForZeroShotClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`e5_large_mnli_pipeline` is a English model originally trained by mjwong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/e5_large_mnli_pipeline_en_5.5.0_3.0_1727223033526.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/e5_large_mnli_pipeline_en_5.5.0_3.0_1727223033526.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("e5_large_mnli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("e5_large_mnli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|e5_large_mnli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/mjwong/e5-large-mnli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForZeroShotClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-english_astitchtask1a_bertbasecased_falsetrue_0_3_best_en.md b/docs/_posts/ahmedlone127/2024-09-25-english_astitchtask1a_bertbasecased_falsetrue_0_3_best_en.md new file mode 100644 index 00000000000000..15db0a3b61b507 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-english_astitchtask1a_bertbasecased_falsetrue_0_3_best_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English english_astitchtask1a_bertbasecased_falsetrue_0_3_best BertForSequenceClassification from harish +author: John Snow Labs +name: english_astitchtask1a_bertbasecased_falsetrue_0_3_best +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`english_astitchtask1a_bertbasecased_falsetrue_0_3_best` is a English model originally trained by harish. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/english_astitchtask1a_bertbasecased_falsetrue_0_3_best_en_5.5.0_3.0_1727277609277.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/english_astitchtask1a_bertbasecased_falsetrue_0_3_best_en_5.5.0_3.0_1727277609277.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("english_astitchtask1a_bertbasecased_falsetrue_0_3_best","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("english_astitchtask1a_bertbasecased_falsetrue_0_3_best", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|english_astitchtask1a_bertbasecased_falsetrue_0_3_best| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/harish/EN-AStitchTask1A-BERTBaseCased-FalseTrue-0-3-BEST \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-english_base_en.md b/docs/_posts/ahmedlone127/2024-09-25-english_base_en.md new file mode 100644 index 00000000000000..c70489d6a1f7d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-english_base_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English english_base BertForTokenClassification from mudes +author: John Snow Labs +name: english_base +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`english_base` is a English model originally trained by mudes. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/english_base_en_5.5.0_3.0_1727270402283.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/english_base_en_5.5.0_3.0_1727270402283.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("english_base","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("english_base", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|english_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.7 MB| + +## References + +https://huggingface.co/mudes/en-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-estbert128_rubric_et.md b/docs/_posts/ahmedlone127/2024-09-25-estbert128_rubric_et.md new file mode 100644 index 00000000000000..8db925642367d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-estbert128_rubric_et.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Estonian estbert128_rubric BertForSequenceClassification from tartuNLP +author: John Snow Labs +name: estbert128_rubric +date: 2024-09-25 +tags: [et, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: et +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`estbert128_rubric` is a Estonian model originally trained by tartuNLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/estbert128_rubric_et_5.5.0_3.0_1727272904136.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/estbert128_rubric_et_5.5.0_3.0_1727272904136.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("estbert128_rubric","et") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("estbert128_rubric", "et") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|estbert128_rubric| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|et| +|Size:|465.7 MB| + +## References + +https://huggingface.co/tartuNLP/EstBERT128_Rubric \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-estbert128_rubric_pipeline_et.md b/docs/_posts/ahmedlone127/2024-09-25-estbert128_rubric_pipeline_et.md new file mode 100644 index 00000000000000..40eb842769e69c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-estbert128_rubric_pipeline_et.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Estonian estbert128_rubric_pipeline pipeline BertForSequenceClassification from tartuNLP +author: John Snow Labs +name: estbert128_rubric_pipeline +date: 2024-09-25 +tags: [et, open_source, pipeline, onnx] +task: Text Classification +language: et +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`estbert128_rubric_pipeline` is a Estonian model originally trained by tartuNLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/estbert128_rubric_pipeline_et_5.5.0_3.0_1727272932764.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/estbert128_rubric_pipeline_et_5.5.0_3.0_1727272932764.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("estbert128_rubric_pipeline", lang = "et") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("estbert128_rubric_pipeline", lang = "et") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|estbert128_rubric_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|et| +|Size:|465.7 MB| + +## References + +https://huggingface.co/tartuNLP/EstBERT128_Rubric + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-fake_news_classifier_en.md b/docs/_posts/ahmedlone127/2024-09-25-fake_news_classifier_en.md new file mode 100644 index 00000000000000..6784665c6dd8ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-fake_news_classifier_en.md @@ -0,0 +1,96 @@ +--- +layout: model +title: English fake_news_classifier RoBertaForSequenceClassification from T0asty +author: John Snow Labs +name: fake_news_classifier +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fake_news_classifier` is a English model originally trained by T0asty. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fake_news_classifier_en_5.5.0_3.0_1727242442851.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fake_news_classifier_en_5.5.0_3.0_1727242442851.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("fake_news_classifier","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("fake_news_classifier", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fake_news_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +References + +https://huggingface.co/T0asty/fake-news-classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-fake_news_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-fake_news_classifier_pipeline_en.md new file mode 100644 index 00000000000000..3ea10846f2c12f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-fake_news_classifier_pipeline_en.md @@ -0,0 +1,72 @@ +--- +layout: model +title: English fake_news_classifier_pipeline pipeline RoBertaForSequenceClassification from T0asty +author: John Snow Labs +name: fake_news_classifier_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fake_news_classifier_pipeline` is a English model originally trained by T0asty. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fake_news_classifier_pipeline_en_5.5.0_3.0_1727242465113.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fake_news_classifier_pipeline_en_5.5.0_3.0_1727242465113.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +pipeline = PretrainedPipeline("fake_news_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) +``` +```scala +val pipeline = new PretrainedPipeline("fake_news_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fake_news_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +References + +https://huggingface.co/T0asty/fake-news-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-fakenews_bert_base_cased_denyol_en.md b/docs/_posts/ahmedlone127/2024-09-25-fakenews_bert_base_cased_denyol_en.md new file mode 100644 index 00000000000000..8d1914a2908cb4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-fakenews_bert_base_cased_denyol_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fakenews_bert_base_cased_denyol BertForSequenceClassification from Denyol +author: John Snow Labs +name: fakenews_bert_base_cased_denyol +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fakenews_bert_base_cased_denyol` is a English model originally trained by Denyol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fakenews_bert_base_cased_denyol_en_5.5.0_3.0_1727276634895.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fakenews_bert_base_cased_denyol_en_5.5.0_3.0_1727276634895.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("fakenews_bert_base_cased_denyol","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("fakenews_bert_base_cased_denyol", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fakenews_bert_base_cased_denyol| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Denyol/FakeNews-bert-base-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-fakenews_bert_base_cased_denyol_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-fakenews_bert_base_cased_denyol_pipeline_en.md new file mode 100644 index 00000000000000..c1b9705f1136e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-fakenews_bert_base_cased_denyol_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English fakenews_bert_base_cased_denyol_pipeline pipeline BertForSequenceClassification from Denyol +author: John Snow Labs +name: fakenews_bert_base_cased_denyol_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fakenews_bert_base_cased_denyol_pipeline` is a English model originally trained by Denyol. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fakenews_bert_base_cased_denyol_pipeline_en_5.5.0_3.0_1727276657063.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fakenews_bert_base_cased_denyol_pipeline_en_5.5.0_3.0_1727276657063.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("fakenews_bert_base_cased_denyol_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("fakenews_bert_base_cased_denyol_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fakenews_bert_base_cased_denyol_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Denyol/FakeNews-bert-base-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-favs_filtersort_multilabel_classification_bert_base_cased_jacquesle_en.md b/docs/_posts/ahmedlone127/2024-09-25-favs_filtersort_multilabel_classification_bert_base_cased_jacquesle_en.md new file mode 100644 index 00000000000000..2b707721043224 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-favs_filtersort_multilabel_classification_bert_base_cased_jacquesle_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English favs_filtersort_multilabel_classification_bert_base_cased_jacquesle BertForSequenceClassification from jacquesle +author: John Snow Labs +name: favs_filtersort_multilabel_classification_bert_base_cased_jacquesle +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`favs_filtersort_multilabel_classification_bert_base_cased_jacquesle` is a English model originally trained by jacquesle. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/favs_filtersort_multilabel_classification_bert_base_cased_jacquesle_en_5.5.0_3.0_1727276870556.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/favs_filtersort_multilabel_classification_bert_base_cased_jacquesle_en_5.5.0_3.0_1727276870556.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("favs_filtersort_multilabel_classification_bert_base_cased_jacquesle","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("favs_filtersort_multilabel_classification_bert_base_cased_jacquesle", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|favs_filtersort_multilabel_classification_bert_base_cased_jacquesle| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/jacquesle/favs-filtersort-multilabel-classification-bert-base-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_en.md b/docs/_posts/ahmedlone127/2024-09-25-favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_en.md new file mode 100644 index 00000000000000..e4b5eec690125f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407 BertForSequenceClassification from nguyenkhoa2407 +author: John Snow Labs +name: favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407` is a English model originally trained by nguyenkhoa2407. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_en_5.5.0_3.0_1727277882852.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_en_5.5.0_3.0_1727277882852.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/nguyenkhoa2407/favs-filtersort-multilabel-classification-bert-base-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_pipeline_en.md new file mode 100644 index 00000000000000..b6a3be0b046c95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_pipeline pipeline BertForSequenceClassification from nguyenkhoa2407 +author: John Snow Labs +name: favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_pipeline` is a English model originally trained by nguyenkhoa2407. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_pipeline_en_5.5.0_3.0_1727277904230.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_pipeline_en_5.5.0_3.0_1727277904230.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|favs_filtersort_multilabel_classification_bert_base_cased_nguyenkhoa2407_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/nguyenkhoa2407/favs-filtersort-multilabel-classification-bert-base-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-finbert_tuned_en.md b/docs/_posts/ahmedlone127/2024-09-25-finbert_tuned_en.md new file mode 100644 index 00000000000000..8b0a6f6e5af445 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-finbert_tuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finbert_tuned BertForSequenceClassification from manvik28 +author: John Snow Labs +name: finbert_tuned +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finbert_tuned` is a English model originally trained by manvik28. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finbert_tuned_en_5.5.0_3.0_1727285196421.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finbert_tuned_en_5.5.0_3.0_1727285196421.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("finbert_tuned","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("finbert_tuned", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finbert_tuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/manvik28/FinBERT_Tuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-fine_tuned_bert_czech_wikann_en.md b/docs/_posts/ahmedlone127/2024-09-25-fine_tuned_bert_czech_wikann_en.md new file mode 100644 index 00000000000000..5caf9ab545e613 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-fine_tuned_bert_czech_wikann_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English fine_tuned_bert_czech_wikann BertForTokenClassification from stulcrad +author: John Snow Labs +name: fine_tuned_bert_czech_wikann +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`fine_tuned_bert_czech_wikann` is a English model originally trained by stulcrad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/fine_tuned_bert_czech_wikann_en_5.5.0_3.0_1727275485253.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/fine_tuned_bert_czech_wikann_en_5.5.0_3.0_1727275485253.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("fine_tuned_bert_czech_wikann","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("fine_tuned_bert_czech_wikann", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|fine_tuned_bert_czech_wikann| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/stulcrad/fine_tuned_BERT_cs_wikann \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-finetuned_bert_base_on_shemo_transcripts_en.md b/docs/_posts/ahmedlone127/2024-09-25-finetuned_bert_base_on_shemo_transcripts_en.md new file mode 100644 index 00000000000000..797d3919c82c97 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-finetuned_bert_base_on_shemo_transcripts_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_bert_base_on_shemo_transcripts BertForSequenceClassification from minoosh +author: John Snow Labs +name: finetuned_bert_base_on_shemo_transcripts +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_bert_base_on_shemo_transcripts` is a English model originally trained by minoosh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_bert_base_on_shemo_transcripts_en_5.5.0_3.0_1727263641848.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_bert_base_on_shemo_transcripts_en_5.5.0_3.0_1727263641848.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("finetuned_bert_base_on_shemo_transcripts","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("finetuned_bert_base_on_shemo_transcripts", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_bert_base_on_shemo_transcripts| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/minoosh/finetuned_bert-base_on_shEMO_transcripts \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-finetuned_bert_base_uncased_olivernyu_en.md b/docs/_posts/ahmedlone127/2024-09-25-finetuned_bert_base_uncased_olivernyu_en.md new file mode 100644 index 00000000000000..b5674e748644e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-finetuned_bert_base_uncased_olivernyu_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuned_bert_base_uncased_olivernyu BertForSequenceClassification from Olivernyu +author: John Snow Labs +name: finetuned_bert_base_uncased_olivernyu +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuned_bert_base_uncased_olivernyu` is a English model originally trained by Olivernyu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuned_bert_base_uncased_olivernyu_en_5.5.0_3.0_1727263622137.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuned_bert_base_uncased_olivernyu_en_5.5.0_3.0_1727263622137.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("finetuned_bert_base_uncased_olivernyu","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("finetuned_bert_base_uncased_olivernyu", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuned_bert_base_uncased_olivernyu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Olivernyu/finetuned_bert_base_uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-finetuning_classification_model_3000_samples_en.md b/docs/_posts/ahmedlone127/2024-09-25-finetuning_classification_model_3000_samples_en.md new file mode 100644 index 00000000000000..d47bd1c26b6d0f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-finetuning_classification_model_3000_samples_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_classification_model_3000_samples BertForSequenceClassification from GMW123 +author: John Snow Labs +name: finetuning_classification_model_3000_samples +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_classification_model_3000_samples` is a English model originally trained by GMW123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_classification_model_3000_samples_en_5.5.0_3.0_1727254007618.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_classification_model_3000_samples_en_5.5.0_3.0_1727254007618.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("finetuning_classification_model_3000_samples","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("finetuning_classification_model_3000_samples", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_classification_model_3000_samples| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|84.8 MB| + +## References + +https://huggingface.co/GMW123/finetuning-classification-model-3000-samples \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-finetuning_classification_model_3000_samples_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-finetuning_classification_model_3000_samples_pipeline_en.md new file mode 100644 index 00000000000000..0d33b112fe55d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-finetuning_classification_model_3000_samples_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_classification_model_3000_samples_pipeline pipeline BertForSequenceClassification from GMW123 +author: John Snow Labs +name: finetuning_classification_model_3000_samples_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_classification_model_3000_samples_pipeline` is a English model originally trained by GMW123. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_classification_model_3000_samples_pipeline_en_5.5.0_3.0_1727254011991.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_classification_model_3000_samples_pipeline_en_5.5.0_3.0_1727254011991.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_classification_model_3000_samples_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_classification_model_3000_samples_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_classification_model_3000_samples_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|84.8 MB| + +## References + +https://huggingface.co/GMW123/finetuning-classification-model-3000-samples + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-finetuning_sentiment_analysis_bert2epoch_en.md b/docs/_posts/ahmedlone127/2024-09-25-finetuning_sentiment_analysis_bert2epoch_en.md new file mode 100644 index 00000000000000..e65d2a1d97aaa0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-finetuning_sentiment_analysis_bert2epoch_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English finetuning_sentiment_analysis_bert2epoch BertForSequenceClassification from aruca +author: John Snow Labs +name: finetuning_sentiment_analysis_bert2epoch +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_analysis_bert2epoch` is a English model originally trained by aruca. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_bert2epoch_en_5.5.0_3.0_1727266348143.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_bert2epoch_en_5.5.0_3.0_1727266348143.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("finetuning_sentiment_analysis_bert2epoch","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("finetuning_sentiment_analysis_bert2epoch", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_analysis_bert2epoch| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/aruca/finetuning-sentiment-analysis-bert2epoch \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-finetuning_sentiment_analysis_bert2epoch_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-finetuning_sentiment_analysis_bert2epoch_pipeline_en.md new file mode 100644 index 00000000000000..31d0d4e7ab1e5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-finetuning_sentiment_analysis_bert2epoch_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English finetuning_sentiment_analysis_bert2epoch_pipeline pipeline BertForSequenceClassification from aruca +author: John Snow Labs +name: finetuning_sentiment_analysis_bert2epoch_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`finetuning_sentiment_analysis_bert2epoch_pipeline` is a English model originally trained by aruca. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_bert2epoch_pipeline_en_5.5.0_3.0_1727266370972.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/finetuning_sentiment_analysis_bert2epoch_pipeline_en_5.5.0_3.0_1727266370972.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("finetuning_sentiment_analysis_bert2epoch_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("finetuning_sentiment_analysis_bert2epoch_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|finetuning_sentiment_analysis_bert2epoch_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/aruca/finetuning-sentiment-analysis-bert2epoch + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-gbert_germeval_2021_de.md b/docs/_posts/ahmedlone127/2024-09-25-gbert_germeval_2021_de.md new file mode 100644 index 00000000000000..866409e61b6aa6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-gbert_germeval_2021_de.md @@ -0,0 +1,94 @@ +--- +layout: model +title: German gbert_germeval_2021 BertForSequenceClassification from shahrukhx01 +author: John Snow Labs +name: gbert_germeval_2021 +date: 2024-09-25 +tags: [de, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gbert_germeval_2021` is a German model originally trained by shahrukhx01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gbert_germeval_2021_de_5.5.0_3.0_1727286926871.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gbert_germeval_2021_de_5.5.0_3.0_1727286926871.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("gbert_germeval_2021","de") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("gbert_germeval_2021", "de") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gbert_germeval_2021| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|de| +|Size:|412.0 MB| + +## References + +https://huggingface.co/shahrukhx01/gbert-germeval-2021 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-gbert_germeval_2021_pipeline_de.md b/docs/_posts/ahmedlone127/2024-09-25-gbert_germeval_2021_pipeline_de.md new file mode 100644 index 00000000000000..c7b90831c5807d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-gbert_germeval_2021_pipeline_de.md @@ -0,0 +1,70 @@ +--- +layout: model +title: German gbert_germeval_2021_pipeline pipeline BertForSequenceClassification from shahrukhx01 +author: John Snow Labs +name: gbert_germeval_2021_pipeline +date: 2024-09-25 +tags: [de, open_source, pipeline, onnx] +task: Text Classification +language: de +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`gbert_germeval_2021_pipeline` is a German model originally trained by shahrukhx01. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/gbert_germeval_2021_pipeline_de_5.5.0_3.0_1727286947949.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/gbert_germeval_2021_pipeline_de_5.5.0_3.0_1727286947949.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("gbert_germeval_2021_pipeline", lang = "de") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("gbert_germeval_2021_pipeline", lang = "de") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|gbert_germeval_2021_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|de| +|Size:|412.0 MB| + +## References + +https://huggingface.co/shahrukhx01/gbert-germeval-2021 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-genome_finder_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-genome_finder_pipeline_en.md new file mode 100644 index 00000000000000..165761ed5c5a43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-genome_finder_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English genome_finder_pipeline pipeline BertForSequenceClassification from rdhinaz +author: John Snow Labs +name: genome_finder_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`genome_finder_pipeline` is a English model originally trained by rdhinaz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/genome_finder_pipeline_en_5.5.0_3.0_1727273166754.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/genome_finder_pipeline_en_5.5.0_3.0_1727273166754.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("genome_finder_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("genome_finder_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|genome_finder_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/rdhinaz/genome-finder + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-geotrend_10_epochs_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-geotrend_10_epochs_pipeline_en.md new file mode 100644 index 00000000000000..73027b44b68277 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-geotrend_10_epochs_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English geotrend_10_epochs_pipeline pipeline BertForTokenClassification from Azizun +author: John Snow Labs +name: geotrend_10_epochs_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`geotrend_10_epochs_pipeline` is a English model originally trained by Azizun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/geotrend_10_epochs_pipeline_en_5.5.0_3.0_1727281877953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/geotrend_10_epochs_pipeline_en_5.5.0_3.0_1727281877953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("geotrend_10_epochs_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("geotrend_10_epochs_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|geotrend_10_epochs_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|344.9 MB| + +## References + +https://huggingface.co/Azizun/Geotrend-10-epochs + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-hate_ita_it.md b/docs/_posts/ahmedlone127/2024-09-25-hate_ita_it.md new file mode 100644 index 00000000000000..530bcf3108b171 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-hate_ita_it.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Italian hate_ita XlmRoBertaForSequenceClassification from MilaNLProc +author: John Snow Labs +name: hate_ita +date: 2024-09-25 +tags: [it, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_ita` is a Italian model originally trained by MilaNLProc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_ita_it_5.5.0_3.0_1727229657021.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_ita_it_5.5.0_3.0_1727229657021.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hate_ita","it") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("hate_ita", "it") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_ita| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|it| +|Size:|1.0 GB| + +## References + +https://huggingface.co/MilaNLProc/hate-ita \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-hate_ita_pipeline_it.md b/docs/_posts/ahmedlone127/2024-09-25-hate_ita_pipeline_it.md new file mode 100644 index 00000000000000..810578872a99ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-hate_ita_pipeline_it.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Italian hate_ita_pipeline pipeline XlmRoBertaForSequenceClassification from MilaNLProc +author: John Snow Labs +name: hate_ita_pipeline +date: 2024-09-25 +tags: [it, open_source, pipeline, onnx] +task: Text Classification +language: it +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_ita_pipeline` is a Italian model originally trained by MilaNLProc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_ita_pipeline_it_5.5.0_3.0_1727229711840.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_ita_pipeline_it_5.5.0_3.0_1727229711840.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hate_ita_pipeline", lang = "it") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hate_ita_pipeline", lang = "it") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_ita_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|it| +|Size:|1.0 GB| + +## References + +https://huggingface.co/MilaNLProc/hate-ita + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-hate_speech_slo_sl.md b/docs/_posts/ahmedlone127/2024-09-25-hate_speech_slo_sl.md new file mode 100644 index 00000000000000..f0c299a0500413 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-hate_speech_slo_sl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Slovenian hate_speech_slo BertForSequenceClassification from IMSyPP +author: John Snow Labs +name: hate_speech_slo +date: 2024-09-25 +tags: [sl, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: sl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hate_speech_slo` is a Slovenian model originally trained by IMSyPP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hate_speech_slo_sl_5.5.0_3.0_1727245720551.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hate_speech_slo_sl_5.5.0_3.0_1727245720551.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("hate_speech_slo","sl") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("hate_speech_slo", "sl") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hate_speech_slo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|sl| +|Size:|465.7 MB| + +## References + +https://huggingface.co/IMSyPP/hate_speech_slo \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-hatexplain_ds_labeled_001_en.md b/docs/_posts/ahmedlone127/2024-09-25-hatexplain_ds_labeled_001_en.md new file mode 100644 index 00000000000000..ea03fcbaf50e57 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-hatexplain_ds_labeled_001_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hatexplain_ds_labeled_001 BertForSequenceClassification from jkhan447 +author: John Snow Labs +name: hatexplain_ds_labeled_001 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hatexplain_ds_labeled_001` is a English model originally trained by jkhan447. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hatexplain_ds_labeled_001_en_5.5.0_3.0_1727267777367.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hatexplain_ds_labeled_001_en_5.5.0_3.0_1727267777367.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("hatexplain_ds_labeled_001","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("hatexplain_ds_labeled_001", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hatexplain_ds_labeled_001| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/jkhan447/HateXplain-DS-labeled-001 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-hatexplain_weighted_majority_labeled_en.md b/docs/_posts/ahmedlone127/2024-09-25-hatexplain_weighted_majority_labeled_en.md new file mode 100644 index 00000000000000..c974f958958af5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-hatexplain_weighted_majority_labeled_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English hatexplain_weighted_majority_labeled BertForSequenceClassification from jkhan447 +author: John Snow Labs +name: hatexplain_weighted_majority_labeled +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hatexplain_weighted_majority_labeled` is a English model originally trained by jkhan447. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hatexplain_weighted_majority_labeled_en_5.5.0_3.0_1727268331231.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hatexplain_weighted_majority_labeled_en_5.5.0_3.0_1727268331231.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("hatexplain_weighted_majority_labeled","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("hatexplain_weighted_majority_labeled", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hatexplain_weighted_majority_labeled| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/jkhan447/HateXplain-weighted-majority-labeled \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-hindi_topic_all_doc_hi.md b/docs/_posts/ahmedlone127/2024-09-25-hindi_topic_all_doc_hi.md new file mode 100644 index 00000000000000..1552c41fda6e5a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-hindi_topic_all_doc_hi.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Hindi hindi_topic_all_doc BertForSequenceClassification from l3cube-pune +author: John Snow Labs +name: hindi_topic_all_doc +date: 2024-09-25 +tags: [hi, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hindi_topic_all_doc` is a Hindi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hindi_topic_all_doc_hi_5.5.0_3.0_1727238129237.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hindi_topic_all_doc_hi_5.5.0_3.0_1727238129237.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("hindi_topic_all_doc","hi") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("hindi_topic_all_doc", "hi") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hindi_topic_all_doc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|hi| +|Size:|892.9 MB| + +## References + +https://huggingface.co/l3cube-pune/hindi-topic-all-doc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-hindi_topic_all_doc_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-25-hindi_topic_all_doc_pipeline_hi.md new file mode 100644 index 00000000000000..bc924212b5bea6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-hindi_topic_all_doc_pipeline_hi.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Hindi hindi_topic_all_doc_pipeline pipeline BertForSequenceClassification from l3cube-pune +author: John Snow Labs +name: hindi_topic_all_doc_pipeline +date: 2024-09-25 +tags: [hi, open_source, pipeline, onnx] +task: Text Classification +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`hindi_topic_all_doc_pipeline` is a Hindi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/hindi_topic_all_doc_pipeline_hi_5.5.0_3.0_1727238174674.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/hindi_topic_all_doc_pipeline_hi_5.5.0_3.0_1727238174674.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("hindi_topic_all_doc_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("hindi_topic_all_doc_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|hindi_topic_all_doc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|892.9 MB| + +## References + +https://huggingface.co/l3cube-pune/hindi-topic-all-doc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ic_en.md b/docs/_posts/ahmedlone127/2024-09-25-ic_en.md new file mode 100644 index 00000000000000..d6ac54880b0b54 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ic BertForSequenceClassification from JohnDoe70 +author: John Snow Labs +name: ic +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ic` is a English model originally trained by JohnDoe70. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ic_en_5.5.0_3.0_1727261919875.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ic_en_5.5.0_3.0_1727261919875.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("ic","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("ic", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/JohnDoe70/ic \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-ic_pipeline_en.md new file mode 100644 index 00000000000000..e78905fdff9c9e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ic_pipeline pipeline BertForSequenceClassification from JohnDoe70 +author: John Snow Labs +name: ic_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ic_pipeline` is a English model originally trained by JohnDoe70. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ic_pipeline_en_5.5.0_3.0_1727261941152.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ic_pipeline_en_5.5.0_3.0_1727261941152.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|410.1 MB| + +## References + +https://huggingface.co/JohnDoe70/ic + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ideology_facebookai_xlm_roberta_large_en.md b/docs/_posts/ahmedlone127/2024-09-25-ideology_facebookai_xlm_roberta_large_en.md new file mode 100644 index 00000000000000..3cd10d2cb4e893 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ideology_facebookai_xlm_roberta_large_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ideology_facebookai_xlm_roberta_large RoBertaForSequenceClassification from juan-glez29 +author: John Snow Labs +name: ideology_facebookai_xlm_roberta_large +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ideology_facebookai_xlm_roberta_large` is a English model originally trained by juan-glez29. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ideology_facebookai_xlm_roberta_large_en_5.5.0_3.0_1727233724111.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ideology_facebookai_xlm_roberta_large_en_5.5.0_3.0_1727233724111.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("ideology_facebookai_xlm_roberta_large","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("ideology_facebookai_xlm_roberta_large", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ideology_facebookai_xlm_roberta_large| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|406.8 MB| + +## References + +https://huggingface.co/juan-glez29/ideology-FacebookAI-xlm-roberta-large \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-incel_alberto_en.md b/docs/_posts/ahmedlone127/2024-09-25-incel_alberto_en.md new file mode 100644 index 00000000000000..ef8f218f19b26b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-incel_alberto_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English incel_alberto BertEmbeddings from pgajo +author: John Snow Labs +name: incel_alberto +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`incel_alberto` is a English model originally trained by pgajo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/incel_alberto_en_5.5.0_3.0_1727243328830.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/incel_alberto_en_5.5.0_3.0_1727243328830.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("incel_alberto","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("incel_alberto","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|incel_alberto| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|688.7 MB| + +## References + +https://huggingface.co/pgajo/incel-alberto \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-incel_alberto_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-incel_alberto_pipeline_en.md new file mode 100644 index 00000000000000..264d3bab2b5c58 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-incel_alberto_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English incel_alberto_pipeline pipeline BertEmbeddings from pgajo +author: John Snow Labs +name: incel_alberto_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`incel_alberto_pipeline` is a English model originally trained by pgajo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/incel_alberto_pipeline_en_5.5.0_3.0_1727243364491.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/incel_alberto_pipeline_en_5.5.0_3.0_1727243364491.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("incel_alberto_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("incel_alberto_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|incel_alberto_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|688.8 MB| + +## References + +https://huggingface.co/pgajo/incel-alberto + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-jaberv2_en.md b/docs/_posts/ahmedlone127/2024-09-25-jaberv2_en.md new file mode 100644 index 00000000000000..a39fe61380e58f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-jaberv2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English jaberv2 BertEmbeddings from huawei-noah +author: John Snow Labs +name: jaberv2 +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jaberv2` is a English model originally trained by huawei-noah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jaberv2_en_5.5.0_3.0_1727258162659.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jaberv2_en_5.5.0_3.0_1727258162659.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("jaberv2","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("jaberv2","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jaberv2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|504.8 MB| + +## References + +https://huggingface.co/huawei-noah/JABERv2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-jaberv2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-jaberv2_pipeline_en.md new file mode 100644 index 00000000000000..17b38f85b42a68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-jaberv2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English jaberv2_pipeline pipeline BertEmbeddings from huawei-noah +author: John Snow Labs +name: jaberv2_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`jaberv2_pipeline` is a English model originally trained by huawei-noah. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/jaberv2_pipeline_en_5.5.0_3.0_1727258189766.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/jaberv2_pipeline_en_5.5.0_3.0_1727258189766.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("jaberv2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("jaberv2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|jaberv2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|504.9 MB| + +## References + +https://huggingface.co/huawei-noah/JABERv2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-khadija_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-khadija_ner_pipeline_en.md new file mode 100644 index 00000000000000..1b8ad571628d8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-khadija_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English khadija_ner_pipeline pipeline BertForTokenClassification from didazz +author: John Snow Labs +name: khadija_ner_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`khadija_ner_pipeline` is a English model originally trained by didazz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/khadija_ner_pipeline_en_5.5.0_3.0_1727283171556.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/khadija_ner_pipeline_en_5.5.0_3.0_1727283171556.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("khadija_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("khadija_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|khadija_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/didazz/khadija_ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-kid_whisper_medium_english_myst_cslu_en.md b/docs/_posts/ahmedlone127/2024-09-25-kid_whisper_medium_english_myst_cslu_en.md new file mode 100644 index 00000000000000..057c4bd2512430 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-kid_whisper_medium_english_myst_cslu_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English kid_whisper_medium_english_myst_cslu WhisperForCTC from aadel4 +author: John Snow Labs +name: kid_whisper_medium_english_myst_cslu +date: 2024-09-25 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kid_whisper_medium_english_myst_cslu` is a English model originally trained by aadel4. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kid_whisper_medium_english_myst_cslu_en_5.5.0_3.0_1727227825039.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kid_whisper_medium_english_myst_cslu_en_5.5.0_3.0_1727227825039.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("kid_whisper_medium_english_myst_cslu","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("kid_whisper_medium_english_myst_cslu", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kid_whisper_medium_english_myst_cslu| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|4.8 GB| + +## References + +https://huggingface.co/aadel4/kid-whisper-medium-en-myst_cslu \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-klue_bert_base_senti_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-25-klue_bert_base_senti_pipeline_ko.md new file mode 100644 index 00000000000000..48e4e569db8f3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-klue_bert_base_senti_pipeline_ko.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Korean klue_bert_base_senti_pipeline pipeline BertForSequenceClassification from dudududukim +author: John Snow Labs +name: klue_bert_base_senti_pipeline +date: 2024-09-25 +tags: [ko, open_source, pipeline, onnx] +task: Text Classification +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`klue_bert_base_senti_pipeline` is a Korean model originally trained by dudududukim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/klue_bert_base_senti_pipeline_ko_5.5.0_3.0_1727242428540.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/klue_bert_base_senti_pipeline_ko_5.5.0_3.0_1727242428540.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("klue_bert_base_senti_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("klue_bert_base_senti_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|klue_bert_base_senti_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|414.8 MB| + +## References + +https://huggingface.co/dudududukim/klue-bert-base-senti + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-kor_naver_ner_name_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-kor_naver_ner_name_pipeline_en.md new file mode 100644 index 00000000000000..7508143860f69a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-kor_naver_ner_name_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English kor_naver_ner_name_pipeline pipeline BertForTokenClassification from joon09 +author: John Snow Labs +name: kor_naver_ner_name_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kor_naver_ner_name_pipeline` is a English model originally trained by joon09. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kor_naver_ner_name_pipeline_en_5.5.0_3.0_1727262973966.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kor_naver_ner_name_pipeline_en_5.5.0_3.0_1727262973966.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("kor_naver_ner_name_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("kor_naver_ner_name_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kor_naver_ner_name_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|441.2 MB| + +## References + +https://huggingface.co/joon09/kor-naver-ner-name + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-korean_albert_base_v1_ko.md b/docs/_posts/ahmedlone127/2024-09-25-korean_albert_base_v1_ko.md new file mode 100644 index 00000000000000..d149178124721b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-korean_albert_base_v1_ko.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Korean korean_albert_base_v1 BertEmbeddings from lots-o +author: John Snow Labs +name: korean_albert_base_v1 +date: 2024-09-25 +tags: [ko, open_source, onnx, embeddings, bert] +task: Embeddings +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`korean_albert_base_v1` is a Korean model originally trained by lots-o. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/korean_albert_base_v1_ko_5.5.0_3.0_1727236721501.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/korean_albert_base_v1_ko_5.5.0_3.0_1727236721501.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("korean_albert_base_v1","ko") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("korean_albert_base_v1","ko") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|korean_albert_base_v1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|ko| +|Size:|47.7 MB| + +## References + +https://huggingface.co/lots-o/ko-albert-base-v1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-korean_albert_base_v1_pipeline_ko.md b/docs/_posts/ahmedlone127/2024-09-25-korean_albert_base_v1_pipeline_ko.md new file mode 100644 index 00000000000000..02006a79b196ec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-korean_albert_base_v1_pipeline_ko.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Korean korean_albert_base_v1_pipeline pipeline BertEmbeddings from lots-o +author: John Snow Labs +name: korean_albert_base_v1_pipeline +date: 2024-09-25 +tags: [ko, open_source, pipeline, onnx] +task: Embeddings +language: ko +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`korean_albert_base_v1_pipeline` is a Korean model originally trained by lots-o. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/korean_albert_base_v1_pipeline_ko_5.5.0_3.0_1727236724245.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/korean_albert_base_v1_pipeline_ko_5.5.0_3.0_1727236724245.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("korean_albert_base_v1_pipeline", lang = "ko") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("korean_albert_base_v1_pipeline", lang = "ko") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|korean_albert_base_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ko| +|Size:|47.8 MB| + +## References + +https://huggingface.co/lots-o/ko-albert-base-v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-korean_disease_ner_en.md b/docs/_posts/ahmedlone127/2024-09-25-korean_disease_ner_en.md new file mode 100644 index 00000000000000..56df4728f29277 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-korean_disease_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English korean_disease_ner BertForTokenClassification from keonju +author: John Snow Labs +name: korean_disease_ner +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`korean_disease_ner` is a English model originally trained by keonju. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/korean_disease_ner_en_5.5.0_3.0_1727283023258.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/korean_disease_ner_en_5.5.0_3.0_1727283023258.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("korean_disease_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("korean_disease_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|korean_disease_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|412.4 MB| + +## References + +https://huggingface.co/keonju/korean_disease_ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-kyrgyz_language_ner_ky.md b/docs/_posts/ahmedlone127/2024-09-25-kyrgyz_language_ner_ky.md new file mode 100644 index 00000000000000..03ae008b7d2c9a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-kyrgyz_language_ner_ky.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Kirghiz, Kyrgyz kyrgyz_language_ner BertForTokenClassification from murat +author: John Snow Labs +name: kyrgyz_language_ner +date: 2024-09-25 +tags: [ky, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: ky +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kyrgyz_language_ner` is a Kirghiz, Kyrgyz model originally trained by murat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kyrgyz_language_ner_ky_5.5.0_3.0_1727249916067.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kyrgyz_language_ner_ky_5.5.0_3.0_1727249916067.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("kyrgyz_language_ner","ky") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("kyrgyz_language_ner", "ky") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kyrgyz_language_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|ky| +|Size:|665.1 MB| + +## References + +https://huggingface.co/murat/kyrgyz_language_NER \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-kyrgyz_language_ner_pipeline_ky.md b/docs/_posts/ahmedlone127/2024-09-25-kyrgyz_language_ner_pipeline_ky.md new file mode 100644 index 00000000000000..a05b622e466037 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-kyrgyz_language_ner_pipeline_ky.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Kirghiz, Kyrgyz kyrgyz_language_ner_pipeline pipeline BertForTokenClassification from murat +author: John Snow Labs +name: kyrgyz_language_ner_pipeline +date: 2024-09-25 +tags: [ky, open_source, pipeline, onnx] +task: Named Entity Recognition +language: ky +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`kyrgyz_language_ner_pipeline` is a Kirghiz, Kyrgyz model originally trained by murat. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/kyrgyz_language_ner_pipeline_ky_5.5.0_3.0_1727249950446.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/kyrgyz_language_ner_pipeline_ky_5.5.0_3.0_1727249950446.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("kyrgyz_language_ner_pipeline", lang = "ky") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("kyrgyz_language_ner_pipeline", lang = "ky") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|kyrgyz_language_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ky| +|Size:|665.1 MB| + +## References + +https://huggingface.co/murat/kyrgyz_language_NER + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-labse_malach_multilabel_en.md b/docs/_posts/ahmedlone127/2024-09-25-labse_malach_multilabel_en.md new file mode 100644 index 00000000000000..7dbb6a675c7dbe --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-labse_malach_multilabel_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English labse_malach_multilabel BertForSequenceClassification from ChrisBridges +author: John Snow Labs +name: labse_malach_multilabel +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`labse_malach_multilabel` is a English model originally trained by ChrisBridges. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/labse_malach_multilabel_en_5.5.0_3.0_1727240285805.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/labse_malach_multilabel_en_5.5.0_3.0_1727240285805.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("labse_malach_multilabel","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("labse_malach_multilabel", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|labse_malach_multilabel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.8 GB| + +## References + +https://huggingface.co/ChrisBridges/labse-malach-multilabel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-labse_malach_multilabel_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-labse_malach_multilabel_pipeline_en.md new file mode 100644 index 00000000000000..e02218ccab4d2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-labse_malach_multilabel_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English labse_malach_multilabel_pipeline pipeline BertForSequenceClassification from ChrisBridges +author: John Snow Labs +name: labse_malach_multilabel_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`labse_malach_multilabel_pipeline` is a English model originally trained by ChrisBridges. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/labse_malach_multilabel_pipeline_en_5.5.0_3.0_1727240370019.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/labse_malach_multilabel_pipeline_en_5.5.0_3.0_1727240370019.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("labse_malach_multilabel_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("labse_malach_multilabel_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|labse_malach_multilabel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.8 GB| + +## References + +https://huggingface.co/ChrisBridges/labse-malach-multilabel + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-legal_bert_samoan_gen1_large_summarized_chuvash_4_en.md b/docs/_posts/ahmedlone127/2024-09-25-legal_bert_samoan_gen1_large_summarized_chuvash_4_en.md new file mode 100644 index 00000000000000..0bded634167878 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-legal_bert_samoan_gen1_large_summarized_chuvash_4_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English legal_bert_samoan_gen1_large_summarized_chuvash_4 BertForSequenceClassification from wiorz +author: John Snow Labs +name: legal_bert_samoan_gen1_large_summarized_chuvash_4 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`legal_bert_samoan_gen1_large_summarized_chuvash_4` is a English model originally trained by wiorz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/legal_bert_samoan_gen1_large_summarized_chuvash_4_en_5.5.0_3.0_1727288632555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/legal_bert_samoan_gen1_large_summarized_chuvash_4_en_5.5.0_3.0_1727288632555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("legal_bert_samoan_gen1_large_summarized_chuvash_4","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("legal_bert_samoan_gen1_large_summarized_chuvash_4", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|legal_bert_samoan_gen1_large_summarized_chuvash_4| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/wiorz/legal_bert_sm_gen1_large_summarized_cv_4 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-malaysian_whisper_small_ms.md b/docs/_posts/ahmedlone127/2024-09-25-malaysian_whisper_small_ms.md new file mode 100644 index 00000000000000..a737dbefcb4701 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-malaysian_whisper_small_ms.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Malay (macrolanguage) malaysian_whisper_small WhisperForCTC from mesolitica +author: John Snow Labs +name: malaysian_whisper_small +date: 2024-09-25 +tags: [ms, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: ms +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`malaysian_whisper_small` is a Malay (macrolanguage) model originally trained by mesolitica. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/malaysian_whisper_small_ms_5.5.0_3.0_1727226895155.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/malaysian_whisper_small_ms_5.5.0_3.0_1727226895155.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("malaysian_whisper_small","ms") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("malaysian_whisper_small", "ms") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|malaysian_whisper_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|ms| +|Size:|856.3 MB| + +## References + +https://huggingface.co/mesolitica/malaysian-whisper-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-malaysian_whisper_small_pipeline_ms.md b/docs/_posts/ahmedlone127/2024-09-25-malaysian_whisper_small_pipeline_ms.md new file mode 100644 index 00000000000000..bd25b793d07ec8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-malaysian_whisper_small_pipeline_ms.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Malay (macrolanguage) malaysian_whisper_small_pipeline pipeline WhisperForCTC from mesolitica +author: John Snow Labs +name: malaysian_whisper_small_pipeline +date: 2024-09-25 +tags: [ms, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: ms +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`malaysian_whisper_small_pipeline` is a Malay (macrolanguage) model originally trained by mesolitica. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/malaysian_whisper_small_pipeline_ms_5.5.0_3.0_1727227176898.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/malaysian_whisper_small_pipeline_ms_5.5.0_3.0_1727227176898.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("malaysian_whisper_small_pipeline", lang = "ms") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("malaysian_whisper_small_pipeline", lang = "ms") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|malaysian_whisper_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ms| +|Size:|856.3 MB| + +## References + +https://huggingface.co/mesolitica/malaysian-whisper-small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-marathi_marh_val_f_mr.md b/docs/_posts/ahmedlone127/2024-09-25-marathi_marh_val_f_mr.md new file mode 100644 index 00000000000000..45106bb99ef809 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-marathi_marh_val_f_mr.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Marathi marathi_marh_val_f WhisperForCTC from simran14 +author: John Snow Labs +name: marathi_marh_val_f +date: 2024-09-25 +tags: [mr, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: mr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marathi_marh_val_f` is a Marathi model originally trained by simran14. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marathi_marh_val_f_mr_5.5.0_3.0_1727226055295.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marathi_marh_val_f_mr_5.5.0_3.0_1727226055295.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("marathi_marh_val_f","mr") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("marathi_marh_val_f", "mr") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marathi_marh_val_f| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|mr| +|Size:|1.7 GB| + +## References + +https://huggingface.co/simran14/mr-val-f \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-marathi_marh_val_f_pipeline_mr.md b/docs/_posts/ahmedlone127/2024-09-25-marathi_marh_val_f_pipeline_mr.md new file mode 100644 index 00000000000000..e8e091c0baa0b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-marathi_marh_val_f_pipeline_mr.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Marathi marathi_marh_val_f_pipeline pipeline WhisperForCTC from simran14 +author: John Snow Labs +name: marathi_marh_val_f_pipeline +date: 2024-09-25 +tags: [mr, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: mr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marathi_marh_val_f_pipeline` is a Marathi model originally trained by simran14. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marathi_marh_val_f_pipeline_mr_5.5.0_3.0_1727226147970.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marathi_marh_val_f_pipeline_mr_5.5.0_3.0_1727226147970.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marathi_marh_val_f_pipeline", lang = "mr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marathi_marh_val_f_pipeline", lang = "mr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marathi_marh_val_f_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mr| +|Size:|1.7 GB| + +## References + +https://huggingface.co/simran14/mr-val-f + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-marbertv2_flat_seed_42_en.md b/docs/_posts/ahmedlone127/2024-09-25-marbertv2_flat_seed_42_en.md new file mode 100644 index 00000000000000..ccb777a558da41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-marbertv2_flat_seed_42_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English marbertv2_flat_seed_42 BertForTokenClassification from ahmedoumar +author: John Snow Labs +name: marbertv2_flat_seed_42 +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marbertv2_flat_seed_42` is a English model originally trained by ahmedoumar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marbertv2_flat_seed_42_en_5.5.0_3.0_1727275573150.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marbertv2_flat_seed_42_en_5.5.0_3.0_1727275573150.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("marbertv2_flat_seed_42","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("marbertv2_flat_seed_42", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marbertv2_flat_seed_42| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|606.7 MB| + +## References + +https://huggingface.co/ahmedoumar/MARBERTv2_FLAT_SEED_42 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-marbertv2_flat_seed_42_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-marbertv2_flat_seed_42_pipeline_en.md new file mode 100644 index 00000000000000..644d1bc0f640aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-marbertv2_flat_seed_42_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English marbertv2_flat_seed_42_pipeline pipeline BertForTokenClassification from ahmedoumar +author: John Snow Labs +name: marbertv2_flat_seed_42_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`marbertv2_flat_seed_42_pipeline` is a English model originally trained by ahmedoumar. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/marbertv2_flat_seed_42_pipeline_en_5.5.0_3.0_1727275605543.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/marbertv2_flat_seed_42_pipeline_en_5.5.0_3.0_1727275605543.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("marbertv2_flat_seed_42_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("marbertv2_flat_seed_42_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|marbertv2_flat_seed_42_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|606.7 MB| + +## References + +https://huggingface.co/ahmedoumar/MARBERTv2_FLAT_SEED_42 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-matscibert_cner_en.md b/docs/_posts/ahmedlone127/2024-09-25-matscibert_cner_en.md new file mode 100644 index 00000000000000..2f91f8d26d5f29 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-matscibert_cner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English matscibert_cner BertForTokenClassification from nlp-magnets +author: John Snow Labs +name: matscibert_cner +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`matscibert_cner` is a English model originally trained by nlp-magnets. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/matscibert_cner_en_5.5.0_3.0_1727275895658.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/matscibert_cner_en_5.5.0_3.0_1727275895658.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("matscibert_cner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("matscibert_cner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|matscibert_cner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|409.9 MB| + +## References + +https://huggingface.co/nlp-magnets/matscibert-cner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-matscibert_cner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-matscibert_cner_pipeline_en.md new file mode 100644 index 00000000000000..273c9202218ec0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-matscibert_cner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English matscibert_cner_pipeline pipeline BertForTokenClassification from nlp-magnets +author: John Snow Labs +name: matscibert_cner_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`matscibert_cner_pipeline` is a English model originally trained by nlp-magnets. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/matscibert_cner_pipeline_en_5.5.0_3.0_1727275917628.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/matscibert_cner_pipeline_en_5.5.0_3.0_1727275917628.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("matscibert_cner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("matscibert_cner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|matscibert_cner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/nlp-magnets/matscibert-cner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-mbert_finetuned_sdgs_en.md b/docs/_posts/ahmedlone127/2024-09-25-mbert_finetuned_sdgs_en.md new file mode 100644 index 00000000000000..ddfd03f84fb913 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-mbert_finetuned_sdgs_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mbert_finetuned_sdgs BertForSequenceClassification from aadhistii +author: John Snow Labs +name: mbert_finetuned_sdgs +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mbert_finetuned_sdgs` is a English model originally trained by aadhistii. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mbert_finetuned_sdgs_en_5.5.0_3.0_1727277659365.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mbert_finetuned_sdgs_en_5.5.0_3.0_1727277659365.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("mbert_finetuned_sdgs","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("mbert_finetuned_sdgs", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mbert_finetuned_sdgs| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|667.3 MB| + +## References + +https://huggingface.co/aadhistii/mbert-finetuned-sdgs \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-memo_bert_wsd_01_en.md b/docs/_posts/ahmedlone127/2024-09-25-memo_bert_wsd_01_en.md new file mode 100644 index 00000000000000..b5897ec74530f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-memo_bert_wsd_01_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English memo_bert_wsd_01 BertForSequenceClassification from yemen2016 +author: John Snow Labs +name: memo_bert_wsd_01 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`memo_bert_wsd_01` is a English model originally trained by yemen2016. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/memo_bert_wsd_01_en_5.5.0_3.0_1727285931602.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/memo_bert_wsd_01_en_5.5.0_3.0_1727285931602.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("memo_bert_wsd_01","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("memo_bert_wsd_01", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|memo_bert_wsd_01| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|410.3 MB| + +## References + +https://huggingface.co/yemen2016/MeMo_BERT-WSD-01 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-mitre_bert_base_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-mitre_bert_base_cased_pipeline_en.md new file mode 100644 index 00000000000000..ff0521e83a94ae --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-mitre_bert_base_cased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mitre_bert_base_cased_pipeline pipeline BertForSequenceClassification from bencyc1129 +author: John Snow Labs +name: mitre_bert_base_cased_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mitre_bert_base_cased_pipeline` is a English model originally trained by bencyc1129. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mitre_bert_base_cased_pipeline_en_5.5.0_3.0_1727266950234.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mitre_bert_base_cased_pipeline_en_5.5.0_3.0_1727266950234.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mitre_bert_base_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mitre_bert_base_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mitre_bert_base_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/bencyc1129/mitre-bert-base-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-mobilebert_sanskrit_saskta_pre_training_complete_en.md b/docs/_posts/ahmedlone127/2024-09-25-mobilebert_sanskrit_saskta_pre_training_complete_en.md new file mode 100644 index 00000000000000..270462cd6b8caf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-mobilebert_sanskrit_saskta_pre_training_complete_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mobilebert_sanskrit_saskta_pre_training_complete BertEmbeddings from gokuls +author: John Snow Labs +name: mobilebert_sanskrit_saskta_pre_training_complete +date: 2024-09-25 +tags: [en, open_source, onnx, embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mobilebert_sanskrit_saskta_pre_training_complete` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mobilebert_sanskrit_saskta_pre_training_complete_en_5.5.0_3.0_1727241248026.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mobilebert_sanskrit_saskta_pre_training_complete_en_5.5.0_3.0_1727241248026.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +embeddings = BertEmbeddings.pretrained("mobilebert_sanskrit_saskta_pre_training_complete","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val embeddings = BertEmbeddings.pretrained("mobilebert_sanskrit_saskta_pre_training_complete","en") + .setInputCols(Array("document", "token")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mobilebert_sanskrit_saskta_pre_training_complete| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[bert]| +|Language:|en| +|Size:|92.5 MB| + +## References + +https://huggingface.co/gokuls/mobilebert_sa_pre-training-complete \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-mobilebert_sanskrit_saskta_pre_training_complete_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-mobilebert_sanskrit_saskta_pre_training_complete_pipeline_en.md new file mode 100644 index 00000000000000..4a8b8a87ceba3a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-mobilebert_sanskrit_saskta_pre_training_complete_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English mobilebert_sanskrit_saskta_pre_training_complete_pipeline pipeline BertEmbeddings from gokuls +author: John Snow Labs +name: mobilebert_sanskrit_saskta_pre_training_complete_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mobilebert_sanskrit_saskta_pre_training_complete_pipeline` is a English model originally trained by gokuls. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mobilebert_sanskrit_saskta_pre_training_complete_pipeline_en_5.5.0_3.0_1727241252743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mobilebert_sanskrit_saskta_pre_training_complete_pipeline_en_5.5.0_3.0_1727241252743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("mobilebert_sanskrit_saskta_pre_training_complete_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("mobilebert_sanskrit_saskta_pre_training_complete_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mobilebert_sanskrit_saskta_pre_training_complete_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|92.5 MB| + +## References + +https://huggingface.co/gokuls/mobilebert_sa_pre-training-complete + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-mobilebert_stsb_en.md b/docs/_posts/ahmedlone127/2024-09-25-mobilebert_stsb_en.md new file mode 100644 index 00000000000000..74403cf3ff6ec4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-mobilebert_stsb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English mobilebert_stsb BertForSequenceClassification from Alireza1044 +author: John Snow Labs +name: mobilebert_stsb +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`mobilebert_stsb` is a English model originally trained by Alireza1044. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/mobilebert_stsb_en_5.5.0_3.0_1727287378824.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/mobilebert_stsb_en_5.5.0_3.0_1727287378824.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("mobilebert_stsb","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("mobilebert_stsb", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|mobilebert_stsb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|92.5 MB| + +## References + +https://huggingface.co/Alireza1044/mobilebert_stsb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-modela_1_12_2023_en.md b/docs/_posts/ahmedlone127/2024-09-25-modela_1_12_2023_en.md new file mode 100644 index 00000000000000..c8782ff8345c11 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-modela_1_12_2023_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English modela_1_12_2023 BertForTokenClassification from MaryDatascientist +author: John Snow Labs +name: modela_1_12_2023 +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`modela_1_12_2023` is a English model originally trained by MaryDatascientist. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/modela_1_12_2023_en_5.5.0_3.0_1727264606089.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/modela_1_12_2023_en_5.5.0_3.0_1727264606089.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("modela_1_12_2023","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("modela_1_12_2023", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|modela_1_12_2023| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/MaryDatascientist/modelA_1_12_2023 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-modelo_racismo_9_april_24_en.md b/docs/_posts/ahmedlone127/2024-09-25-modelo_racismo_9_april_24_en.md new file mode 100644 index 00000000000000..adc1bc99bf607e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-modelo_racismo_9_april_24_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English modelo_racismo_9_april_24 BertForSequenceClassification from leofn3 +author: John Snow Labs +name: modelo_racismo_9_april_24 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`modelo_racismo_9_april_24` is a English model originally trained by leofn3. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/modelo_racismo_9_april_24_en_5.5.0_3.0_1727276484310.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/modelo_racismo_9_april_24_en_5.5.0_3.0_1727276484310.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("modelo_racismo_9_april_24","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("modelo_racismo_9_april_24", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|modelo_racismo_9_april_24| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|667.3 MB| + +## References + +https://huggingface.co/leofn3/modelo_racismo_9_april_24 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-movie_genre_classifier_davooddkareshki_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-movie_genre_classifier_davooddkareshki_pipeline_en.md new file mode 100644 index 00000000000000..543cdb77d93ed0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-movie_genre_classifier_davooddkareshki_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English movie_genre_classifier_davooddkareshki_pipeline pipeline BertForSequenceClassification from davooddkareshki +author: John Snow Labs +name: movie_genre_classifier_davooddkareshki_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`movie_genre_classifier_davooddkareshki_pipeline` is a English model originally trained by davooddkareshki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/movie_genre_classifier_davooddkareshki_pipeline_en_5.5.0_3.0_1727277484777.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/movie_genre_classifier_davooddkareshki_pipeline_en_5.5.0_3.0_1727277484777.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("movie_genre_classifier_davooddkareshki_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("movie_genre_classifier_davooddkareshki_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|movie_genre_classifier_davooddkareshki_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/davooddkareshki/Movie_Genre_Classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-multitaskdistilledmodel_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-multitaskdistilledmodel_pipeline_en.md new file mode 100644 index 00000000000000..f6ee94c889d611 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-multitaskdistilledmodel_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English multitaskdistilledmodel_pipeline pipeline BertForSequenceClassification from privacy-tech-lab +author: John Snow Labs +name: multitaskdistilledmodel_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`multitaskdistilledmodel_pipeline` is a English model originally trained by privacy-tech-lab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/multitaskdistilledmodel_pipeline_en_5.5.0_3.0_1727286271829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/multitaskdistilledmodel_pipeline_en_5.5.0_3.0_1727286271829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("multitaskdistilledmodel_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("multitaskdistilledmodel_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|multitaskdistilledmodel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|54.2 MB| + +## References + +https://huggingface.co/privacy-tech-lab/MultitaskDistilledModel + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-n_bert_imdb_padding20model_en.md b/docs/_posts/ahmedlone127/2024-09-25-n_bert_imdb_padding20model_en.md new file mode 100644 index 00000000000000..b856b80e592ac3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-n_bert_imdb_padding20model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_bert_imdb_padding20model BertForSequenceClassification from Realgon +author: John Snow Labs +name: n_bert_imdb_padding20model +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_bert_imdb_padding20model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_bert_imdb_padding20model_en_5.5.0_3.0_1727267427419.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_bert_imdb_padding20model_en_5.5.0_3.0_1727267427419.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("n_bert_imdb_padding20model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("n_bert_imdb_padding20model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_bert_imdb_padding20model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/Realgon/N_bert_imdb_padding20model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-n_bert_imdb_padding80model_en.md b/docs/_posts/ahmedlone127/2024-09-25-n_bert_imdb_padding80model_en.md new file mode 100644 index 00000000000000..a19eb7f8a5f1bf --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-n_bert_imdb_padding80model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_bert_imdb_padding80model BertForSequenceClassification from Realgon +author: John Snow Labs +name: n_bert_imdb_padding80model +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_bert_imdb_padding80model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_bert_imdb_padding80model_en_5.5.0_3.0_1727278327518.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_bert_imdb_padding80model_en_5.5.0_3.0_1727278327518.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("n_bert_imdb_padding80model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("n_bert_imdb_padding80model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_bert_imdb_padding80model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.6 MB| + +## References + +https://huggingface.co/Realgon/N_bert_imdb_padding80model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-n_bert_imdb_padding80model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-n_bert_imdb_padding80model_pipeline_en.md new file mode 100644 index 00000000000000..ee2b4479aa1eaa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-n_bert_imdb_padding80model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_bert_imdb_padding80model_pipeline pipeline BertForSequenceClassification from Realgon +author: John Snow Labs +name: n_bert_imdb_padding80model_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_bert_imdb_padding80model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_bert_imdb_padding80model_pipeline_en_5.5.0_3.0_1727278348554.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_bert_imdb_padding80model_pipeline_en_5.5.0_3.0_1727278348554.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_bert_imdb_padding80model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_bert_imdb_padding80model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_bert_imdb_padding80model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.7 MB| + +## References + +https://huggingface.co/Realgon/N_bert_imdb_padding80model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-n_bert_sst5_padding100model_en.md b/docs/_posts/ahmedlone127/2024-09-25-n_bert_sst5_padding100model_en.md new file mode 100644 index 00000000000000..01c3011bf6a77c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-n_bert_sst5_padding100model_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English n_bert_sst5_padding100model BertForSequenceClassification from Realgon +author: John Snow Labs +name: n_bert_sst5_padding100model +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_bert_sst5_padding100model` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_bert_sst5_padding100model_en_5.5.0_3.0_1727278694323.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_bert_sst5_padding100model_en_5.5.0_3.0_1727278694323.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("n_bert_sst5_padding100model","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("n_bert_sst5_padding100model", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_bert_sst5_padding100model| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.7 MB| + +## References + +https://huggingface.co/Realgon/N_bert_sst5_padding100model \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-n_bert_twitterfin_padding60model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-n_bert_twitterfin_padding60model_pipeline_en.md new file mode 100644 index 00000000000000..88cdf45a1b0aab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-n_bert_twitterfin_padding60model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_bert_twitterfin_padding60model_pipeline pipeline BertForSequenceClassification from Realgon +author: John Snow Labs +name: n_bert_twitterfin_padding60model_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_bert_twitterfin_padding60model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_bert_twitterfin_padding60model_pipeline_en_5.5.0_3.0_1727286190095.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_bert_twitterfin_padding60model_pipeline_en_5.5.0_3.0_1727286190095.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_bert_twitterfin_padding60model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_bert_twitterfin_padding60model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_bert_twitterfin_padding60model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.6 MB| + +## References + +https://huggingface.co/Realgon/N_bert_twitterfin_padding60model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-n_bert_twitterfin_padding90model_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-n_bert_twitterfin_padding90model_pipeline_en.md new file mode 100644 index 00000000000000..b27a287b77c8d6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-n_bert_twitterfin_padding90model_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English n_bert_twitterfin_padding90model_pipeline pipeline BertForSequenceClassification from Realgon +author: John Snow Labs +name: n_bert_twitterfin_padding90model_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`n_bert_twitterfin_padding90model_pipeline` is a English model originally trained by Realgon. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/n_bert_twitterfin_padding90model_pipeline_en_5.5.0_3.0_1727279592031.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/n_bert_twitterfin_padding90model_pipeline_en_5.5.0_3.0_1727279592031.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("n_bert_twitterfin_padding90model_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("n_bert_twitterfin_padding90model_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|n_bert_twitterfin_padding90model_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.7 MB| + +## References + +https://huggingface.co/Realgon/N_bert_twitterfin_padding90model + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-name_anonymization_tr.md b/docs/_posts/ahmedlone127/2024-09-25-name_anonymization_tr.md new file mode 100644 index 00000000000000..78d78b901202d1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-name_anonymization_tr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Turkish name_anonymization BertForTokenClassification from deprem-ml +author: John Snow Labs +name: name_anonymization +date: 2024-09-25 +tags: [tr, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`name_anonymization` is a Turkish model originally trained by deprem-ml. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/name_anonymization_tr_5.5.0_3.0_1727284019137.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/name_anonymization_tr_5.5.0_3.0_1727284019137.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("name_anonymization","tr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("name_anonymization", "tr") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|name_anonymization| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|tr| +|Size:|412.3 MB| + +## References + +https://huggingface.co/deprem-ml/name_anonymization \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ner_bert_ingredients_en.md b/docs/_posts/ahmedlone127/2024-09-25-ner_bert_ingredients_en.md new file mode 100644 index 00000000000000..5de06465fcc7ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ner_bert_ingredients_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_bert_ingredients BertForTokenClassification from Shresthadev403 +author: John Snow Labs +name: ner_bert_ingredients +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_bert_ingredients` is a English model originally trained by Shresthadev403. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_bert_ingredients_en_5.5.0_3.0_1727260673154.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_bert_ingredients_en_5.5.0_3.0_1727260673154.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("ner_bert_ingredients","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("ner_bert_ingredients", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_bert_ingredients| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.3 MB| + +## References + +https://huggingface.co/Shresthadev403/ner-bert-ingredients \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ner_darijabert_arabizi_en.md b/docs/_posts/ahmedlone127/2024-09-25-ner_darijabert_arabizi_en.md new file mode 100644 index 00000000000000..1ff21d48ed9022 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ner_darijabert_arabizi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_darijabert_arabizi BertForTokenClassification from Oelbourki +author: John Snow Labs +name: ner_darijabert_arabizi +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_darijabert_arabizi` is a English model originally trained by Oelbourki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_darijabert_arabizi_en_5.5.0_3.0_1727282192543.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_darijabert_arabizi_en_5.5.0_3.0_1727282192543.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("ner_darijabert_arabizi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("ner_darijabert_arabizi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_darijabert_arabizi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|634.9 MB| + +## References + +https://huggingface.co/Oelbourki/ner-DarijaBERT-arabizi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ner_harem_bert_base_portuguese_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-ner_harem_bert_base_portuguese_cased_en.md new file mode 100644 index 00000000000000..9e4df60e54d0aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ner_harem_bert_base_portuguese_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ner_harem_bert_base_portuguese_cased BertForTokenClassification from liaad +author: John Snow Labs +name: ner_harem_bert_base_portuguese_cased +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_harem_bert_base_portuguese_cased` is a English model originally trained by liaad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_harem_bert_base_portuguese_cased_en_5.5.0_3.0_1727248241432.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_harem_bert_base_portuguese_cased_en_5.5.0_3.0_1727248241432.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("ner_harem_bert_base_portuguese_cased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("ner_harem_bert_base_portuguese_cased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_harem_bert_base_portuguese_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/liaad/NER_harem_bert-base-portuguese-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ner_harem_bert_base_portuguese_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-ner_harem_bert_base_portuguese_cased_pipeline_en.md new file mode 100644 index 00000000000000..aa596ca52a5c6b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ner_harem_bert_base_portuguese_cased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ner_harem_bert_base_portuguese_cased_pipeline pipeline BertForTokenClassification from liaad +author: John Snow Labs +name: ner_harem_bert_base_portuguese_cased_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_harem_bert_base_portuguese_cased_pipeline` is a English model originally trained by liaad. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_harem_bert_base_portuguese_cased_pipeline_en_5.5.0_3.0_1727248267196.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_harem_bert_base_portuguese_cased_pipeline_en_5.5.0_3.0_1727248267196.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ner_harem_bert_base_portuguese_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ner_harem_bert_base_portuguese_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_harem_bert_base_portuguese_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|406.0 MB| + +## References + +https://huggingface.co/liaad/NER_harem_bert-base-portuguese-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ner_resume_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-ner_resume_pipeline_en.md new file mode 100644 index 00000000000000..c0153ed3e40665 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ner_resume_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ner_resume_pipeline pipeline BertForTokenClassification from ClaudiuFilip1100 +author: John Snow Labs +name: ner_resume_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ner_resume_pipeline` is a English model originally trained by ClaudiuFilip1100. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ner_resume_pipeline_en_5.5.0_3.0_1727270974736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ner_resume_pipeline_en_5.5.0_3.0_1727270974736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ner_resume_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ner_resume_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ner_resume_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/ClaudiuFilip1100/ner-resume + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-news_category_classifier_distilbert_en.md b/docs/_posts/ahmedlone127/2024-09-25-news_category_classifier_distilbert_en.md new file mode 100644 index 00000000000000..32fe4dafec34cd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-news_category_classifier_distilbert_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English news_category_classifier_distilbert BertForSequenceClassification from dima806 +author: John Snow Labs +name: news_category_classifier_distilbert +date: 2024-09-25 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`news_category_classifier_distilbert` is a English model originally trained by dima806. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/news_category_classifier_distilbert_en_5.5.0_3.0_1727268666692.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/news_category_classifier_distilbert_en_5.5.0_3.0_1727268666692.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = BertForSequenceClassification.pretrained("news_category_classifier_distilbert","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("news_category_classifier_distilbert","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|news_category_classifier_distilbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.5 MB| + +## References + +References + +https://huggingface.co/dima806/news-category-classifier-distilbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-news_category_classifier_distilbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-news_category_classifier_distilbert_pipeline_en.md new file mode 100644 index 00000000000000..190923ed5fc0a4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-news_category_classifier_distilbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English news_category_classifier_distilbert_pipeline pipeline BertForSequenceClassification from wnic00 +author: John Snow Labs +name: news_category_classifier_distilbert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`news_category_classifier_distilbert_pipeline` is a English model originally trained by wnic00. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/news_category_classifier_distilbert_pipeline_en_5.5.0_3.0_1727268689039.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/news_category_classifier_distilbert_pipeline_en_5.5.0_3.0_1727268689039.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("news_category_classifier_distilbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("news_category_classifier_distilbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|news_category_classifier_distilbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/wnic00/news-category-classifier-distilbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-nlp_sardinian_based_on_bert_en.md b/docs/_posts/ahmedlone127/2024-09-25-nlp_sardinian_based_on_bert_en.md new file mode 100644 index 00000000000000..31d16c98ba12f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-nlp_sardinian_based_on_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English nlp_sardinian_based_on_bert BertForSequenceClassification from 4TB-USTC +author: John Snow Labs +name: nlp_sardinian_based_on_bert +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_sardinian_based_on_bert` is a English model originally trained by 4TB-USTC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_sardinian_based_on_bert_en_5.5.0_3.0_1727288469493.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_sardinian_based_on_bert_en_5.5.0_3.0_1727288469493.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("nlp_sardinian_based_on_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("nlp_sardinian_based_on_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_sardinian_based_on_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/4TB-USTC/nlp_sc_based_on_bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-nlp_sardinian_based_on_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-nlp_sardinian_based_on_bert_pipeline_en.md new file mode 100644 index 00000000000000..ccdd277d28fb1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-nlp_sardinian_based_on_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English nlp_sardinian_based_on_bert_pipeline pipeline BertForSequenceClassification from 4TB-USTC +author: John Snow Labs +name: nlp_sardinian_based_on_bert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nlp_sardinian_based_on_bert_pipeline` is a English model originally trained by 4TB-USTC. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nlp_sardinian_based_on_bert_pipeline_en_5.5.0_3.0_1727288492519.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nlp_sardinian_based_on_bert_pipeline_en_5.5.0_3.0_1727288492519.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nlp_sardinian_based_on_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nlp_sardinian_based_on_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nlp_sardinian_based_on_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/4TB-USTC/nlp_sc_based_on_bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-nonsense_gibberish_detector_pipeline_ru.md b/docs/_posts/ahmedlone127/2024-09-25-nonsense_gibberish_detector_pipeline_ru.md new file mode 100644 index 00000000000000..d636f7605fcbb2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-nonsense_gibberish_detector_pipeline_ru.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Russian nonsense_gibberish_detector_pipeline pipeline BertForSequenceClassification from Den4ikAI +author: John Snow Labs +name: nonsense_gibberish_detector_pipeline +date: 2024-09-25 +tags: [ru, open_source, pipeline, onnx] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nonsense_gibberish_detector_pipeline` is a Russian model originally trained by Den4ikAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nonsense_gibberish_detector_pipeline_ru_5.5.0_3.0_1727239983491.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nonsense_gibberish_detector_pipeline_ru_5.5.0_3.0_1727239983491.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("nonsense_gibberish_detector_pipeline", lang = "ru") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("nonsense_gibberish_detector_pipeline", lang = "ru") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nonsense_gibberish_detector_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ru| +|Size:|426.2 MB| + +## References + +https://huggingface.co/Den4ikAI/nonsense_gibberish_detector + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-nonsense_gibberish_detector_ru.md b/docs/_posts/ahmedlone127/2024-09-25-nonsense_gibberish_detector_ru.md new file mode 100644 index 00000000000000..6c85ad9486dede --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-nonsense_gibberish_detector_ru.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Russian nonsense_gibberish_detector BertForSequenceClassification from Den4ikAI +author: John Snow Labs +name: nonsense_gibberish_detector +date: 2024-09-25 +tags: [ru, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`nonsense_gibberish_detector` is a Russian model originally trained by Den4ikAI. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/nonsense_gibberish_detector_ru_5.5.0_3.0_1727239962190.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/nonsense_gibberish_detector_ru_5.5.0_3.0_1727239962190.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("nonsense_gibberish_detector","ru") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("nonsense_gibberish_detector", "ru") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|nonsense_gibberish_detector| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|ru| +|Size:|426.2 MB| + +## References + +https://huggingface.co/Den4ikAI/nonsense_gibberish_detector \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_base_no.md b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_base_no.md new file mode 100644 index 00000000000000..697d99f4cdec8e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_base_no.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Norwegian norwegian_bokml_whisper_base WhisperForCTC from NbAiLab +author: John Snow Labs +name: norwegian_bokml_whisper_base +date: 2024-09-25 +tags: ["no", open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: "no" +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norwegian_bokml_whisper_base` is a Norwegian model originally trained by NbAiLab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_base_no_5.5.0_3.0_1727223458869.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_base_no_5.5.0_3.0_1727223458869.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("norwegian_bokml_whisper_base","no") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("norwegian_bokml_whisper_base", "no") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norwegian_bokml_whisper_base| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|no| +|Size:|633.6 MB| + +## References + +https://huggingface.co/NbAiLab/nb-whisper-base \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_small_no.md b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_small_no.md new file mode 100644 index 00000000000000..d42a70e5b0f9de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_small_no.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Norwegian norwegian_bokml_whisper_small WhisperForCTC from NbAiLabBeta +author: John Snow Labs +name: norwegian_bokml_whisper_small +date: 2024-09-25 +tags: ["no", open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: "no" +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norwegian_bokml_whisper_small` is a Norwegian model originally trained by NbAiLabBeta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_small_no_5.5.0_3.0_1727223710912.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_small_no_5.5.0_3.0_1727223710912.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("norwegian_bokml_whisper_small","no") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("norwegian_bokml_whisper_small", "no") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norwegian_bokml_whisper_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|no| +|Size:|1.7 GB| + +## References + +https://huggingface.co/NbAiLabBeta/nb-whisper-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_small_pipeline_no.md b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_small_pipeline_no.md new file mode 100644 index 00000000000000..c4be1e2748114d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_small_pipeline_no.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Norwegian norwegian_bokml_whisper_small_pipeline pipeline WhisperForCTC from NbAiLabBeta +author: John Snow Labs +name: norwegian_bokml_whisper_small_pipeline +date: 2024-09-25 +tags: ["no", open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: "no" +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norwegian_bokml_whisper_small_pipeline` is a Norwegian model originally trained by NbAiLabBeta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_small_pipeline_no_5.5.0_3.0_1727223803220.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_small_pipeline_no_5.5.0_3.0_1727223803220.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("norwegian_bokml_whisper_small_pipeline", lang = "no") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("norwegian_bokml_whisper_small_pipeline", lang = "no") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norwegian_bokml_whisper_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|no| +|Size:|1.7 GB| + +## References + +https://huggingface.co/NbAiLabBeta/nb-whisper-small + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_tiny_nbailab_no.md b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_tiny_nbailab_no.md new file mode 100644 index 00000000000000..02c44cae29ff62 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_tiny_nbailab_no.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Norwegian norwegian_bokml_whisper_tiny_nbailab WhisperForCTC from NbAiLab +author: John Snow Labs +name: norwegian_bokml_whisper_tiny_nbailab +date: 2024-09-25 +tags: ["no", open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: "no" +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norwegian_bokml_whisper_tiny_nbailab` is a Norwegian model originally trained by NbAiLab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_tiny_nbailab_no_5.5.0_3.0_1727227479762.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_tiny_nbailab_no_5.5.0_3.0_1727227479762.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("norwegian_bokml_whisper_tiny_nbailab","no") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("norwegian_bokml_whisper_tiny_nbailab", "no") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norwegian_bokml_whisper_tiny_nbailab| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|no| +|Size:|384.2 MB| + +## References + +https://huggingface.co/NbAiLab/nb-whisper-tiny \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_tiny_nbailab_pipeline_no.md b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_tiny_nbailab_pipeline_no.md new file mode 100644 index 00000000000000..202103aef7b4e6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_tiny_nbailab_pipeline_no.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Norwegian norwegian_bokml_whisper_tiny_nbailab_pipeline pipeline WhisperForCTC from NbAiLab +author: John Snow Labs +name: norwegian_bokml_whisper_tiny_nbailab_pipeline +date: 2024-09-25 +tags: ["no", open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: "no" +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norwegian_bokml_whisper_tiny_nbailab_pipeline` is a Norwegian model originally trained by NbAiLab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_tiny_nbailab_pipeline_no_5.5.0_3.0_1727227502382.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_tiny_nbailab_pipeline_no_5.5.0_3.0_1727227502382.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("norwegian_bokml_whisper_tiny_nbailab_pipeline", lang = "no") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("norwegian_bokml_whisper_tiny_nbailab_pipeline", lang = "no") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norwegian_bokml_whisper_tiny_nbailab_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|no| +|Size:|384.2 MB| + +## References + +https://huggingface.co/NbAiLab/nb-whisper-tiny + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_tiny_nbailabbeta_no.md b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_tiny_nbailabbeta_no.md new file mode 100644 index 00000000000000..b52c0cbab903db --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-norwegian_bokml_whisper_tiny_nbailabbeta_no.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Norwegian norwegian_bokml_whisper_tiny_nbailabbeta WhisperForCTC from NbAiLabBeta +author: John Snow Labs +name: norwegian_bokml_whisper_tiny_nbailabbeta +date: 2024-09-25 +tags: ["no", open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: "no" +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`norwegian_bokml_whisper_tiny_nbailabbeta` is a Norwegian model originally trained by NbAiLabBeta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_tiny_nbailabbeta_no_5.5.0_3.0_1727224996671.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/norwegian_bokml_whisper_tiny_nbailabbeta_no_5.5.0_3.0_1727224996671.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("norwegian_bokml_whisper_tiny_nbailabbeta","no") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("norwegian_bokml_whisper_tiny_nbailabbeta", "no") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|norwegian_bokml_whisper_tiny_nbailabbeta| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|no| +|Size:|384.2 MB| + +## References + +https://huggingface.co/NbAiLabBeta/nb-whisper-tiny \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-opticalbert_cner_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-opticalbert_cner_cased_en.md new file mode 100644 index 00000000000000..951494a8e34b4a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-opticalbert_cner_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English opticalbert_cner_cased BertForTokenClassification from opticalmaterials +author: John Snow Labs +name: opticalbert_cner_cased +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opticalbert_cner_cased` is a English model originally trained by opticalmaterials. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opticalbert_cner_cased_en_5.5.0_3.0_1727248350546.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opticalbert_cner_cased_en_5.5.0_3.0_1727248350546.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("opticalbert_cner_cased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("opticalbert_cner_cased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opticalbert_cner_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|403.5 MB| + +## References + +https://huggingface.co/opticalmaterials/opticalbert_cner_cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-opticalbert_cner_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-opticalbert_cner_cased_pipeline_en.md new file mode 100644 index 00000000000000..3ca3b95bea7f1b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-opticalbert_cner_cased_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opticalbert_cner_cased_pipeline pipeline BertForTokenClassification from opticalmaterials +author: John Snow Labs +name: opticalbert_cner_cased_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opticalbert_cner_cased_pipeline` is a English model originally trained by opticalmaterials. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opticalbert_cner_cased_pipeline_en_5.5.0_3.0_1727248371555.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opticalbert_cner_cased_pipeline_en_5.5.0_3.0_1727248371555.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opticalbert_cner_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opticalbert_cner_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opticalbert_cner_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/opticalmaterials/opticalbert_cner_cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-opus_em_augmented_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-opus_em_augmented_pipeline_en.md new file mode 100644 index 00000000000000..9c44c860a7839e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-opus_em_augmented_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English opus_em_augmented_pipeline pipeline BertForSequenceClassification from keremp +author: John Snow Labs +name: opus_em_augmented_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`opus_em_augmented_pipeline` is a English model originally trained by keremp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/opus_em_augmented_pipeline_en_5.5.0_3.0_1727267676216.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/opus_em_augmented_pipeline_en_5.5.0_3.0_1727267676216.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("opus_em_augmented_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("opus_em_augmented_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|opus_em_augmented_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/keremp/opus-em-augmented + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-out_glue_mnli_en.md b/docs/_posts/ahmedlone127/2024-09-25-out_glue_mnli_en.md new file mode 100644 index 00000000000000..483ba4fe648106 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-out_glue_mnli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English out_glue_mnli BertForSequenceClassification from Tural +author: John Snow Labs +name: out_glue_mnli +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`out_glue_mnli` is a English model originally trained by Tural. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/out_glue_mnli_en_5.5.0_3.0_1727264080681.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/out_glue_mnli_en_5.5.0_3.0_1727264080681.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("out_glue_mnli","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("out_glue_mnli", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|out_glue_mnli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|410.2 MB| + +## References + +https://huggingface.co/Tural/out-glue-mnli \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-out_glue_mnli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-out_glue_mnli_pipeline_en.md new file mode 100644 index 00000000000000..9b2648ebf2623c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-out_glue_mnli_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English out_glue_mnli_pipeline pipeline BertForSequenceClassification from Tural +author: John Snow Labs +name: out_glue_mnli_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`out_glue_mnli_pipeline` is a English model originally trained by Tural. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/out_glue_mnli_pipeline_en_5.5.0_3.0_1727264103708.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/out_glue_mnli_pipeline_en_5.5.0_3.0_1727264103708.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("out_glue_mnli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("out_glue_mnli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|out_glue_mnli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|410.3 MB| + +## References + +https://huggingface.co/Tural/out-glue-mnli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-pabee_bert_base_sst2_en.md b/docs/_posts/ahmedlone127/2024-09-25-pabee_bert_base_sst2_en.md new file mode 100644 index 00000000000000..cca658c28071ff --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-pabee_bert_base_sst2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English pabee_bert_base_sst2 BertForSequenceClassification from mattymchen +author: John Snow Labs +name: pabee_bert_base_sst2 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pabee_bert_base_sst2` is a English model originally trained by mattymchen. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pabee_bert_base_sst2_en_5.5.0_3.0_1727276199313.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pabee_bert_base_sst2_en_5.5.0_3.0_1727276199313.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("pabee_bert_base_sst2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("pabee_bert_base_sst2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pabee_bert_base_sst2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/mattymchen/pabee-bert-base-sst2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-pardonmyai_tiny_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-pardonmyai_tiny_pipeline_en.md new file mode 100644 index 00000000000000..0abc2c9a8c7d10 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-pardonmyai_tiny_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English pardonmyai_tiny_pipeline pipeline BertForSequenceClassification from tarekziade +author: John Snow Labs +name: pardonmyai_tiny_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`pardonmyai_tiny_pipeline` is a English model originally trained by tarekziade. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/pardonmyai_tiny_pipeline_en_5.5.0_3.0_1727276171723.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/pardonmyai_tiny_pipeline_en_5.5.0_3.0_1727276171723.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("pardonmyai_tiny_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("pardonmyai_tiny_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|pardonmyai_tiny_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|54.2 MB| + +## References + +https://huggingface.co/tarekziade/pardonmyai-tiny + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phil_oriya_not_v1_2_en.md b/docs/_posts/ahmedlone127/2024-09-25-phil_oriya_not_v1_2_en.md new file mode 100644 index 00000000000000..667746d5b84017 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phil_oriya_not_v1_2_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English phil_oriya_not_v1_2 BertForSequenceClassification from dbourget +author: John Snow Labs +name: phil_oriya_not_v1_2 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phil_oriya_not_v1_2` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phil_oriya_not_v1_2_en_5.5.0_3.0_1727256892636.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phil_oriya_not_v1_2_en_5.5.0_3.0_1727256892636.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("phil_oriya_not_v1_2","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("phil_oriya_not_v1_2", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phil_oriya_not_v1_2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/phil-or-not-v1.2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phil_oriya_not_v1_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-phil_oriya_not_v1_2_pipeline_en.md new file mode 100644 index 00000000000000..87eea4fd656329 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phil_oriya_not_v1_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English phil_oriya_not_v1_2_pipeline pipeline BertForSequenceClassification from dbourget +author: John Snow Labs +name: phil_oriya_not_v1_2_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phil_oriya_not_v1_2_pipeline` is a English model originally trained by dbourget. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phil_oriya_not_v1_2_pipeline_en_5.5.0_3.0_1727256957656.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phil_oriya_not_v1_2_pipeline_en_5.5.0_3.0_1727256957656.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("phil_oriya_not_v1_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("phil_oriya_not_v1_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phil_oriya_not_v1_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/dbourget/phil-or-not-v1.2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_akode_en.md b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_akode_en.md new file mode 100644 index 00000000000000..d99ff5b65001d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_akode_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English phrasebank_sentiment_analysis_akode BertForSequenceClassification from akode +author: John Snow Labs +name: phrasebank_sentiment_analysis_akode +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phrasebank_sentiment_analysis_akode` is a English model originally trained by akode. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_akode_en_5.5.0_3.0_1727285371700.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_akode_en_5.5.0_3.0_1727285371700.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_akode","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_akode", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phrasebank_sentiment_analysis_akode| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/akode/phrasebank-sentiment-analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_amit7859_en.md b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_amit7859_en.md new file mode 100644 index 00000000000000..d5e010a4275d7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_amit7859_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English phrasebank_sentiment_analysis_amit7859 BertForSequenceClassification from amit7859 +author: John Snow Labs +name: phrasebank_sentiment_analysis_amit7859 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phrasebank_sentiment_analysis_amit7859` is a English model originally trained by amit7859. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_amit7859_en_5.5.0_3.0_1727272646752.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_amit7859_en_5.5.0_3.0_1727272646752.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_amit7859","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_amit7859", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phrasebank_sentiment_analysis_amit7859| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/amit7859/phrasebank-sentiment-analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_amit7859_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_amit7859_pipeline_en.md new file mode 100644 index 00000000000000..0c53a832999ad5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_amit7859_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English phrasebank_sentiment_analysis_amit7859_pipeline pipeline BertForSequenceClassification from amit7859 +author: John Snow Labs +name: phrasebank_sentiment_analysis_amit7859_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phrasebank_sentiment_analysis_amit7859_pipeline` is a English model originally trained by amit7859. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_amit7859_pipeline_en_5.5.0_3.0_1727272669484.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_amit7859_pipeline_en_5.5.0_3.0_1727272669484.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("phrasebank_sentiment_analysis_amit7859_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("phrasebank_sentiment_analysis_amit7859_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phrasebank_sentiment_analysis_amit7859_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/amit7859/phrasebank-sentiment-analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_fakhry_en.md b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_fakhry_en.md new file mode 100644 index 00000000000000..897a7bd3bc6add --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_fakhry_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English phrasebank_sentiment_analysis_fakhry BertForSequenceClassification from Fakhry +author: John Snow Labs +name: phrasebank_sentiment_analysis_fakhry +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phrasebank_sentiment_analysis_fakhry` is a English model originally trained by Fakhry. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_fakhry_en_5.5.0_3.0_1727279410361.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_fakhry_en_5.5.0_3.0_1727279410361.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_fakhry","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_fakhry", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phrasebank_sentiment_analysis_fakhry| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Fakhry/phrasebank-sentiment-analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_ramnathv_en.md b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_ramnathv_en.md new file mode 100644 index 00000000000000..f48d1f786feef5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_ramnathv_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English phrasebank_sentiment_analysis_ramnathv BertForSequenceClassification from ramnathv +author: John Snow Labs +name: phrasebank_sentiment_analysis_ramnathv +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phrasebank_sentiment_analysis_ramnathv` is a English model originally trained by ramnathv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_ramnathv_en_5.5.0_3.0_1727269785757.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_ramnathv_en_5.5.0_3.0_1727269785757.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_ramnathv","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_ramnathv", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phrasebank_sentiment_analysis_ramnathv| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ramnathv/phrasebank-sentiment-analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_ramnathv_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_ramnathv_pipeline_en.md new file mode 100644 index 00000000000000..b648fd2cd26b26 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_ramnathv_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English phrasebank_sentiment_analysis_ramnathv_pipeline pipeline BertForSequenceClassification from ramnathv +author: John Snow Labs +name: phrasebank_sentiment_analysis_ramnathv_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phrasebank_sentiment_analysis_ramnathv_pipeline` is a English model originally trained by ramnathv. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_ramnathv_pipeline_en_5.5.0_3.0_1727269808031.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_ramnathv_pipeline_en_5.5.0_3.0_1727269808031.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("phrasebank_sentiment_analysis_ramnathv_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("phrasebank_sentiment_analysis_ramnathv_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phrasebank_sentiment_analysis_ramnathv_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ramnathv/phrasebank-sentiment-analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_richychn_en.md b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_richychn_en.md new file mode 100644 index 00000000000000..7343cebb801ce8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_richychn_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English phrasebank_sentiment_analysis_richychn BertForSequenceClassification from richychn +author: John Snow Labs +name: phrasebank_sentiment_analysis_richychn +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phrasebank_sentiment_analysis_richychn` is a English model originally trained by richychn. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_richychn_en_5.5.0_3.0_1727273091918.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_richychn_en_5.5.0_3.0_1727273091918.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_richychn","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_richychn", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phrasebank_sentiment_analysis_richychn| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/richychn/phrasebank-sentiment-analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_saiteja_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_saiteja_pipeline_en.md new file mode 100644 index 00000000000000..fed93220283bb9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_saiteja_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English phrasebank_sentiment_analysis_saiteja_pipeline pipeline BertForSequenceClassification from Saiteja +author: John Snow Labs +name: phrasebank_sentiment_analysis_saiteja_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phrasebank_sentiment_analysis_saiteja_pipeline` is a English model originally trained by Saiteja. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_saiteja_pipeline_en_5.5.0_3.0_1727268451986.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_saiteja_pipeline_en_5.5.0_3.0_1727268451986.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("phrasebank_sentiment_analysis_saiteja_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("phrasebank_sentiment_analysis_saiteja_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phrasebank_sentiment_analysis_saiteja_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Saiteja/phrasebank-sentiment-analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_stolbiq_en.md b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_stolbiq_en.md new file mode 100644 index 00000000000000..cf69fccff2ae7e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_stolbiq_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English phrasebank_sentiment_analysis_stolbiq BertForSequenceClassification from stolbiq +author: John Snow Labs +name: phrasebank_sentiment_analysis_stolbiq +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phrasebank_sentiment_analysis_stolbiq` is a English model originally trained by stolbiq. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_stolbiq_en_5.5.0_3.0_1727266435933.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_stolbiq_en_5.5.0_3.0_1727266435933.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_stolbiq","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("phrasebank_sentiment_analysis_stolbiq", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phrasebank_sentiment_analysis_stolbiq| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/stolbiq/phrasebank-sentiment-analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_stolbiq_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_stolbiq_pipeline_en.md new file mode 100644 index 00000000000000..d4ce18a19e2154 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-phrasebank_sentiment_analysis_stolbiq_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English phrasebank_sentiment_analysis_stolbiq_pipeline pipeline BertForSequenceClassification from stolbiq +author: John Snow Labs +name: phrasebank_sentiment_analysis_stolbiq_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`phrasebank_sentiment_analysis_stolbiq_pipeline` is a English model originally trained by stolbiq. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_stolbiq_pipeline_en_5.5.0_3.0_1727266458662.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/phrasebank_sentiment_analysis_stolbiq_pipeline_en_5.5.0_3.0_1727266458662.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("phrasebank_sentiment_analysis_stolbiq_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("phrasebank_sentiment_analysis_stolbiq_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|phrasebank_sentiment_analysis_stolbiq_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/stolbiq/phrasebank-sentiment-analysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-polite_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-polite_bert_pipeline_en.md new file mode 100644 index 00000000000000..e9e7c4b8e5f627 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-polite_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English polite_bert_pipeline pipeline BertForSequenceClassification from NOVA-vision-language +author: John Snow Labs +name: polite_bert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`polite_bert_pipeline` is a English model originally trained by NOVA-vision-language. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/polite_bert_pipeline_en_5.5.0_3.0_1727269543128.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/polite_bert_pipeline_en_5.5.0_3.0_1727269543128.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("polite_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("polite_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|polite_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/NOVA-vision-language/polite_bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-postagger_bio_portuguese_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-25-postagger_bio_portuguese_pipeline_pt.md new file mode 100644 index 00000000000000..1acf011046a6c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-postagger_bio_portuguese_pipeline_pt.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Portuguese postagger_bio_portuguese_pipeline pipeline BertForTokenClassification from pucpr-br +author: John Snow Labs +name: postagger_bio_portuguese_pipeline +date: 2024-09-25 +tags: [pt, open_source, pipeline, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`postagger_bio_portuguese_pipeline` is a Portuguese model originally trained by pucpr-br. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/postagger_bio_portuguese_pipeline_pt_5.5.0_3.0_1727259109089.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/postagger_bio_portuguese_pipeline_pt_5.5.0_3.0_1727259109089.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("postagger_bio_portuguese_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("postagger_bio_portuguese_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|postagger_bio_portuguese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|665.0 MB| + +## References + +https://huggingface.co/pucpr-br/postagger-bio-portuguese + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-postagger_bio_portuguese_pt.md b/docs/_posts/ahmedlone127/2024-09-25-postagger_bio_portuguese_pt.md new file mode 100644 index 00000000000000..307d600f04cd15 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-postagger_bio_portuguese_pt.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Portuguese postagger_bio_portuguese BertForTokenClassification from pucpr-br +author: John Snow Labs +name: postagger_bio_portuguese +date: 2024-09-25 +tags: [pt, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`postagger_bio_portuguese` is a Portuguese model originally trained by pucpr-br. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/postagger_bio_portuguese_pt_5.5.0_3.0_1727259074952.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/postagger_bio_portuguese_pt_5.5.0_3.0_1727259074952.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("postagger_bio_portuguese","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("postagger_bio_portuguese", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|postagger_bio_portuguese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|pt| +|Size:|664.9 MB| + +## References + +https://huggingface.co/pucpr-br/postagger-bio-portuguese \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-propoint_final_project_en.md b/docs/_posts/ahmedlone127/2024-09-25-propoint_final_project_en.md new file mode 100644 index 00000000000000..277ffb3ecf8639 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-propoint_final_project_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English propoint_final_project BertForSequenceClassification from DataAngelo +author: John Snow Labs +name: propoint_final_project +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`propoint_final_project` is a English model originally trained by DataAngelo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/propoint_final_project_en_5.5.0_3.0_1727272671514.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/propoint_final_project_en_5.5.0_3.0_1727272671514.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("propoint_final_project","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("propoint_final_project", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|propoint_final_project| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/DataAngelo/propoint_Final_project \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-prototipo_7_emi_en.md b/docs/_posts/ahmedlone127/2024-09-25-prototipo_7_emi_en.md new file mode 100644 index 00000000000000..403836deb137d2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-prototipo_7_emi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English prototipo_7_emi BertForSequenceClassification from Armandodelca +author: John Snow Labs +name: prototipo_7_emi +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`prototipo_7_emi` is a English model originally trained by Armandodelca. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/prototipo_7_emi_en_5.5.0_3.0_1727270129321.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/prototipo_7_emi_en_5.5.0_3.0_1727270129321.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("prototipo_7_emi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("prototipo_7_emi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|prototipo_7_emi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|411.7 MB| + +## References + +https://huggingface.co/Armandodelca/Prototipo_7_EMI \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-prototipo_7_emi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-prototipo_7_emi_pipeline_en.md new file mode 100644 index 00000000000000..82034aa2fb77e1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-prototipo_7_emi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English prototipo_7_emi_pipeline pipeline BertForSequenceClassification from Armandodelca +author: John Snow Labs +name: prototipo_7_emi_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`prototipo_7_emi_pipeline` is a English model originally trained by Armandodelca. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/prototipo_7_emi_pipeline_en_5.5.0_3.0_1727270150743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/prototipo_7_emi_pipeline_en_5.5.0_3.0_1727270150743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("prototipo_7_emi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("prototipo_7_emi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|prototipo_7_emi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|411.7 MB| + +## References + +https://huggingface.co/Armandodelca/Prototipo_7_EMI + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-psychbert_finetuned_mentalhealth_en.md b/docs/_posts/ahmedlone127/2024-09-25-psychbert_finetuned_mentalhealth_en.md new file mode 100644 index 00000000000000..b96763d4612595 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-psychbert_finetuned_mentalhealth_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English psychbert_finetuned_mentalhealth BertForSequenceClassification from mnaylor +author: John Snow Labs +name: psychbert_finetuned_mentalhealth +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`psychbert_finetuned_mentalhealth` is a English model originally trained by mnaylor. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/psychbert_finetuned_mentalhealth_en_5.5.0_3.0_1727257449190.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/psychbert_finetuned_mentalhealth_en_5.5.0_3.0_1727257449190.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("psychbert_finetuned_mentalhealth","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("psychbert_finetuned_mentalhealth", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|psychbert_finetuned_mentalhealth| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/mnaylor/psychbert-finetuned-mentalhealth \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-re2g_reranker_fever_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-re2g_reranker_fever_pipeline_en.md new file mode 100644 index 00000000000000..220cb9a12218e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-re2g_reranker_fever_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English re2g_reranker_fever_pipeline pipeline BertForSequenceClassification from ibm +author: John Snow Labs +name: re2g_reranker_fever_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`re2g_reranker_fever_pipeline` is a English model originally trained by ibm. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/re2g_reranker_fever_pipeline_en_5.5.0_3.0_1727287247023.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/re2g_reranker_fever_pipeline_en_5.5.0_3.0_1727287247023.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("re2g_reranker_fever_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("re2g_reranker_fever_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|re2g_reranker_fever_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/ibm/re2g-reranker-fever + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-requirements_ambiguity_v2_nl.md b/docs/_posts/ahmedlone127/2024-09-25-requirements_ambiguity_v2_nl.md new file mode 100644 index 00000000000000..c4e79497f8196a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-requirements_ambiguity_v2_nl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Dutch, Flemish requirements_ambiguity_v2 BertForSequenceClassification from denizspynk +author: John Snow Labs +name: requirements_ambiguity_v2 +date: 2024-09-25 +tags: [nl, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`requirements_ambiguity_v2` is a Dutch, Flemish model originally trained by denizspynk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/requirements_ambiguity_v2_nl_5.5.0_3.0_1727267361597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/requirements_ambiguity_v2_nl_5.5.0_3.0_1727267361597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("requirements_ambiguity_v2","nl") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("requirements_ambiguity_v2", "nl") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|requirements_ambiguity_v2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|nl| +|Size:|409.0 MB| + +## References + +https://huggingface.co/denizspynk/requirements_ambiguity_v2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-requirements_ambiguity_v2_pipeline_nl.md b/docs/_posts/ahmedlone127/2024-09-25-requirements_ambiguity_v2_pipeline_nl.md new file mode 100644 index 00000000000000..05d64757cb0842 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-requirements_ambiguity_v2_pipeline_nl.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Dutch, Flemish requirements_ambiguity_v2_pipeline pipeline BertForSequenceClassification from denizspynk +author: John Snow Labs +name: requirements_ambiguity_v2_pipeline +date: 2024-09-25 +tags: [nl, open_source, pipeline, onnx] +task: Text Classification +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`requirements_ambiguity_v2_pipeline` is a Dutch, Flemish model originally trained by denizspynk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/requirements_ambiguity_v2_pipeline_nl_5.5.0_3.0_1727267383613.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/requirements_ambiguity_v2_pipeline_nl_5.5.0_3.0_1727267383613.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("requirements_ambiguity_v2_pipeline", lang = "nl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("requirements_ambiguity_v2_pipeline", lang = "nl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|requirements_ambiguity_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|nl| +|Size:|409.0 MB| + +## References + +https://huggingface.co/denizspynk/requirements_ambiguity_v2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-response_score_en.md b/docs/_posts/ahmedlone127/2024-09-25-response_score_en.md new file mode 100644 index 00000000000000..339aff0e4a8b56 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-response_score_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English response_score BertForSequenceClassification from conversify +author: John Snow Labs +name: response_score +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`response_score` is a English model originally trained by conversify. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/response_score_en_5.5.0_3.0_1727279110839.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/response_score_en_5.5.0_3.0_1727279110839.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("response_score","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("response_score", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|response_score| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/conversify/response-score \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-robert_sst2_sentiment_full_en.md b/docs/_posts/ahmedlone127/2024-09-25-robert_sst2_sentiment_full_en.md new file mode 100644 index 00000000000000..9bc8a833c1f316 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-robert_sst2_sentiment_full_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robert_sst2_sentiment_full RoBertaForSequenceClassification from asm3515 +author: John Snow Labs +name: robert_sst2_sentiment_full +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robert_sst2_sentiment_full` is a English model originally trained by asm3515. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robert_sst2_sentiment_full_en_5.5.0_3.0_1727234101299.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robert_sst2_sentiment_full_en_5.5.0_3.0_1727234101299.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("robert_sst2_sentiment_full","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("robert_sst2_sentiment_full", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robert_sst2_sentiment_full| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|443.3 MB| + +## References + +https://huggingface.co/asm3515/Robert-sst2-sentiment-full \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-robert_sst2_sentiment_full_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-robert_sst2_sentiment_full_pipeline_en.md new file mode 100644 index 00000000000000..a0c0ca3c40b9e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-robert_sst2_sentiment_full_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English robert_sst2_sentiment_full_pipeline pipeline RoBertaForSequenceClassification from asm3515 +author: John Snow Labs +name: robert_sst2_sentiment_full_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robert_sst2_sentiment_full_pipeline` is a English model originally trained by asm3515. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robert_sst2_sentiment_full_pipeline_en_5.5.0_3.0_1727234127250.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robert_sst2_sentiment_full_pipeline_en_5.5.0_3.0_1727234127250.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("robert_sst2_sentiment_full_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("robert_sst2_sentiment_full_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robert_sst2_sentiment_full_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|443.3 MB| + +## References + +https://huggingface.co/asm3515/Robert-sst2-sentiment-full + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-roberta_cased_poem_evalutation_en.md b/docs/_posts/ahmedlone127/2024-09-25-roberta_cased_poem_evalutation_en.md new file mode 100644 index 00000000000000..7cb193885e28f6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-roberta_cased_poem_evalutation_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_cased_poem_evalutation XlmRoBertaForSequenceClassification from numblilbug +author: John Snow Labs +name: roberta_cased_poem_evalutation +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_cased_poem_evalutation` is a English model originally trained by numblilbug. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_cased_poem_evalutation_en_5.5.0_3.0_1727229518168.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_cased_poem_evalutation_en_5.5.0_3.0_1727229518168.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("roberta_cased_poem_evalutation","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("roberta_cased_poem_evalutation", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_cased_poem_evalutation| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|800.6 MB| + +## References + +https://huggingface.co/numblilbug/roberta-cased-poem-evalutation \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-roberta_cased_poem_evalutation_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-roberta_cased_poem_evalutation_pipeline_en.md new file mode 100644 index 00000000000000..bebdbfe2ccfb7a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-roberta_cased_poem_evalutation_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_cased_poem_evalutation_pipeline pipeline XlmRoBertaForSequenceClassification from numblilbug +author: John Snow Labs +name: roberta_cased_poem_evalutation_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_cased_poem_evalutation_pipeline` is a English model originally trained by numblilbug. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_cased_poem_evalutation_pipeline_en_5.5.0_3.0_1727229640522.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_cased_poem_evalutation_pipeline_en_5.5.0_3.0_1727229640522.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_cased_poem_evalutation_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_cased_poem_evalutation_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_cased_poem_evalutation_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|800.7 MB| + +## References + +https://huggingface.co/numblilbug/roberta-cased-poem-evalutation + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-roberta_cws_assamese_en.md b/docs/_posts/ahmedlone127/2024-09-25-roberta_cws_assamese_en.md new file mode 100644 index 00000000000000..598bb471144fb0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-roberta_cws_assamese_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English roberta_cws_assamese BertForTokenClassification from tjspross +author: John Snow Labs +name: roberta_cws_assamese +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_cws_assamese` is a English model originally trained by tjspross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_cws_assamese_en_5.5.0_3.0_1727247216317.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_cws_assamese_en_5.5.0_3.0_1727247216317.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("roberta_cws_assamese","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("roberta_cws_assamese", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_cws_assamese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/tjspross/roberta_cws_as \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-roberta_cws_assamese_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-roberta_cws_assamese_pipeline_en.md new file mode 100644 index 00000000000000..590942d60c19d3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-roberta_cws_assamese_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_cws_assamese_pipeline pipeline BertForTokenClassification from tjspross +author: John Snow Labs +name: roberta_cws_assamese_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_cws_assamese_pipeline` is a English model originally trained by tjspross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_cws_assamese_pipeline_en_5.5.0_3.0_1727247276609.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_cws_assamese_pipeline_en_5.5.0_3.0_1727247276609.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_cws_assamese_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_cws_assamese_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_cws_assamese_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/tjspross/roberta_cws_as + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-roberta_cws_pku_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-roberta_cws_pku_pipeline_en.md new file mode 100644 index 00000000000000..ef3443da8ba3a3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-roberta_cws_pku_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English roberta_cws_pku_pipeline pipeline BertForTokenClassification from tjspross +author: John Snow Labs +name: roberta_cws_pku_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`roberta_cws_pku_pipeline` is a English model originally trained by tjspross. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/roberta_cws_pku_pipeline_en_5.5.0_3.0_1727265373986.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/roberta_cws_pku_pipeline_en_5.5.0_3.0_1727265373986.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("roberta_cws_pku_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("roberta_cws_pku_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|roberta_cws_pku_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/tjspross/roberta_cws_pku + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-robust_bert_yelp_en.md b/docs/_posts/ahmedlone127/2024-09-25-robust_bert_yelp_en.md new file mode 100644 index 00000000000000..068c779df22ceb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-robust_bert_yelp_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English robust_bert_yelp BertForSequenceClassification from JiaqiLee +author: John Snow Labs +name: robust_bert_yelp +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`robust_bert_yelp` is a English model originally trained by JiaqiLee. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/robust_bert_yelp_en_5.5.0_3.0_1727235386984.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/robust_bert_yelp_en_5.5.0_3.0_1727235386984.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("robust_bert_yelp","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("robust_bert_yelp", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|robust_bert_yelp| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/JiaqiLee/robust-bert-yelp \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-rubert_tiny2_russian_financial_sentiment_pipeline_ru.md b/docs/_posts/ahmedlone127/2024-09-25-rubert_tiny2_russian_financial_sentiment_pipeline_ru.md new file mode 100644 index 00000000000000..b14bdf015134fc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-rubert_tiny2_russian_financial_sentiment_pipeline_ru.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Russian rubert_tiny2_russian_financial_sentiment_pipeline pipeline BertForSequenceClassification from mxlcw +author: John Snow Labs +name: rubert_tiny2_russian_financial_sentiment_pipeline +date: 2024-09-25 +tags: [ru, open_source, pipeline, onnx] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_tiny2_russian_financial_sentiment_pipeline` is a Russian model originally trained by mxlcw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_tiny2_russian_financial_sentiment_pipeline_ru_5.5.0_3.0_1727268672632.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_tiny2_russian_financial_sentiment_pipeline_ru_5.5.0_3.0_1727268672632.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("rubert_tiny2_russian_financial_sentiment_pipeline", lang = "ru") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("rubert_tiny2_russian_financial_sentiment_pipeline", lang = "ru") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_tiny2_russian_financial_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ru| +|Size:|109.5 MB| + +## References + +https://huggingface.co/mxlcw/rubert-tiny2-russian-financial-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-rubert_tiny2_russian_financial_sentiment_ru.md b/docs/_posts/ahmedlone127/2024-09-25-rubert_tiny2_russian_financial_sentiment_ru.md new file mode 100644 index 00000000000000..820494880f026a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-rubert_tiny2_russian_financial_sentiment_ru.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Russian rubert_tiny2_russian_financial_sentiment BertForSequenceClassification from mxlcw +author: John Snow Labs +name: rubert_tiny2_russian_financial_sentiment +date: 2024-09-25 +tags: [ru, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`rubert_tiny2_russian_financial_sentiment` is a Russian model originally trained by mxlcw. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/rubert_tiny2_russian_financial_sentiment_ru_5.5.0_3.0_1727268666371.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/rubert_tiny2_russian_financial_sentiment_ru_5.5.0_3.0_1727268666371.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("rubert_tiny2_russian_financial_sentiment","ru") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("rubert_tiny2_russian_financial_sentiment", "ru") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|rubert_tiny2_russian_financial_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|ru| +|Size:|109.5 MB| + +## References + +https://huggingface.co/mxlcw/rubert-tiny2-russian-financial-sentiment \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ruberttiny_multiclassv1_en.md b/docs/_posts/ahmedlone127/2024-09-25-ruberttiny_multiclassv1_en.md new file mode 100644 index 00000000000000..34aeaf44f53a6d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ruberttiny_multiclassv1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ruberttiny_multiclassv1 BertForSequenceClassification from Shakhovak +author: John Snow Labs +name: ruberttiny_multiclassv1 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ruberttiny_multiclassv1` is a English model originally trained by Shakhovak. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ruberttiny_multiclassv1_en_5.5.0_3.0_1727261047373.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ruberttiny_multiclassv1_en_5.5.0_3.0_1727261047373.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("ruberttiny_multiclassv1","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("ruberttiny_multiclassv1", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ruberttiny_multiclassv1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|109.5 MB| + +## References + +https://huggingface.co/Shakhovak/ruBertTiny_multiclassv1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-russscholar_seeker_en.md b/docs/_posts/ahmedlone127/2024-09-25-russscholar_seeker_en.md new file mode 100644 index 00000000000000..baf1cb48a0b8d9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-russscholar_seeker_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English russscholar_seeker BertForSequenceClassification from Gao-Tianci +author: John Snow Labs +name: russscholar_seeker +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`russscholar_seeker` is a English model originally trained by Gao-Tianci. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/russscholar_seeker_en_5.5.0_3.0_1727263668402.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/russscholar_seeker_en_5.5.0_3.0_1727263668402.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("russscholar_seeker","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("russscholar_seeker", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|russscholar_seeker| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Gao-Tianci/RussScholar-Seeker \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sarcasm_detection_bert_base_uncased_cree_en.md b/docs/_posts/ahmedlone127/2024-09-25-sarcasm_detection_bert_base_uncased_cree_en.md new file mode 100644 index 00000000000000..d5e4e5c575031f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sarcasm_detection_bert_base_uncased_cree_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sarcasm_detection_bert_base_uncased_cree BertForSequenceClassification from jkhan447 +author: John Snow Labs +name: sarcasm_detection_bert_base_uncased_cree +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sarcasm_detection_bert_base_uncased_cree` is a English model originally trained by jkhan447. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sarcasm_detection_bert_base_uncased_cree_en_5.5.0_3.0_1727263812158.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sarcasm_detection_bert_base_uncased_cree_en_5.5.0_3.0_1727263812158.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("sarcasm_detection_bert_base_uncased_cree","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("sarcasm_detection_bert_base_uncased_cree", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sarcasm_detection_bert_base_uncased_cree| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/jkhan447/sarcasm-detection-Bert-base-uncased-CR \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_en.md b/docs/_posts/ahmedlone127/2024-09-25-sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_en.md new file mode 100644 index 00000000000000..b486e8e9d57565 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sarcasm_detection_bert_base_uncased_cree_sayula_popoluca BertForSequenceClassification from jkhan447 +author: John Snow Labs +name: sarcasm_detection_bert_base_uncased_cree_sayula_popoluca +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sarcasm_detection_bert_base_uncased_cree_sayula_popoluca` is a English model originally trained by jkhan447. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_en_5.5.0_3.0_1727269746904.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_en_5.5.0_3.0_1727269746904.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("sarcasm_detection_bert_base_uncased_cree_sayula_popoluca","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("sarcasm_detection_bert_base_uncased_cree_sayula_popoluca", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sarcasm_detection_bert_base_uncased_cree_sayula_popoluca| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/jkhan447/sarcasm-detection-Bert-base-uncased-CR-POS \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_pipeline_en.md new file mode 100644 index 00000000000000..dbb32bbe15575d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_pipeline pipeline BertForSequenceClassification from jkhan447 +author: John Snow Labs +name: sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_pipeline` is a English model originally trained by jkhan447. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_pipeline_en_5.5.0_3.0_1727269769545.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_pipeline_en_5.5.0_3.0_1727269769545.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sarcasm_detection_bert_base_uncased_cree_sayula_popoluca_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/jkhan447/sarcasm-detection-Bert-base-uncased-CR-POS + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-segbert_en.md b/docs/_posts/ahmedlone127/2024-09-25-segbert_en.md new file mode 100644 index 00000000000000..40c83e8ca39c83 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-segbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English segbert BertForTokenClassification from gMask +author: John Snow Labs +name: segbert +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`segbert` is a English model originally trained by gMask. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/segbert_en_5.5.0_3.0_1727272315504.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/segbert_en_5.5.0_3.0_1727272315504.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("segbert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("segbert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|segbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|412.6 MB| + +## References + +https://huggingface.co/gMask/SegBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-segbert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-segbert_pipeline_en.md new file mode 100644 index 00000000000000..e0eebd6d3a5959 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-segbert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English segbert_pipeline pipeline BertForTokenClassification from gMask +author: John Snow Labs +name: segbert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`segbert_pipeline` is a English model originally trained by gMask. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/segbert_pipeline_en_5.5.0_3.0_1727272337252.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/segbert_pipeline_en_5.5.0_3.0_1727272337252.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("segbert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("segbert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|segbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.6 MB| + +## References + +https://huggingface.co/gMask/SegBERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sembr2023_bert_mini_en.md b/docs/_posts/ahmedlone127/2024-09-25-sembr2023_bert_mini_en.md new file mode 100644 index 00000000000000..ee6d88576d0a99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sembr2023_bert_mini_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sembr2023_bert_mini BertForTokenClassification from admko +author: John Snow Labs +name: sembr2023_bert_mini +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sembr2023_bert_mini` is a English model originally trained by admko. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sembr2023_bert_mini_en_5.5.0_3.0_1727271738534.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sembr2023_bert_mini_en_5.5.0_3.0_1727271738534.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("sembr2023_bert_mini","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("sembr2023_bert_mini", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sembr2023_bert_mini| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|41.9 MB| + +## References + +https://huggingface.co/admko/sembr2023-bert-mini \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sembr2023_bert_mini_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sembr2023_bert_mini_pipeline_en.md new file mode 100644 index 00000000000000..16a1726ec7bd4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sembr2023_bert_mini_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sembr2023_bert_mini_pipeline pipeline BertForTokenClassification from admko +author: John Snow Labs +name: sembr2023_bert_mini_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sembr2023_bert_mini_pipeline` is a English model originally trained by admko. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sembr2023_bert_mini_pipeline_en_5.5.0_3.0_1727271741023.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sembr2023_bert_mini_pipeline_en_5.5.0_3.0_1727271741023.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sembr2023_bert_mini_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sembr2023_bert_mini_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sembr2023_bert_mini_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|41.9 MB| + +## References + +https://huggingface.co/admko/sembr2023-bert-mini + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_arabert_ar.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_arabert_ar.md new file mode 100644 index 00000000000000..3146c5f1b5dce4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_arabert_ar.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Arabic sent_bert_base_arabert BertSentenceEmbeddings from aubmindlab +author: John Snow Labs +name: sent_bert_base_arabert +date: 2024-09-25 +tags: [ar, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ar +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_arabert` is a Arabic model originally trained by aubmindlab. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabert_ar_5.5.0_3.0_1727252183947.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_arabert_ar_5.5.0_3.0_1727252183947.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_arabert","ar") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_arabert","ar") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_arabert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ar| +|Size:|504.6 MB| + +## References + +https://huggingface.co/aubmindlab/bert-base-arabert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_cased_finetuned_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_cased_finetuned_en.md new file mode 100644 index 00000000000000..3e1bd2ecc876ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_cased_finetuned_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_cased_finetuned BertSentenceEmbeddings from GusNicho +author: John Snow Labs +name: sent_bert_base_cased_finetuned +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_cased_finetuned` is a English model originally trained by GusNicho. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_finetuned_en_5.5.0_3.0_1727248585487.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_finetuned_en_5.5.0_3.0_1727248585487.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_cased_finetuned","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_cased_finetuned","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_cased_finetuned| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/GusNicho/bert-base-cased-finetuned \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_cased_finetuned_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_cased_finetuned_pipeline_en.md new file mode 100644 index 00000000000000..e66ba1d82a3bc5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_cased_finetuned_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_cased_finetuned_pipeline pipeline BertSentenceEmbeddings from GusNicho +author: John Snow Labs +name: sent_bert_base_cased_finetuned_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_cased_finetuned_pipeline` is a English model originally trained by GusNicho. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_finetuned_pipeline_en_5.5.0_3.0_1727248606505.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_cased_finetuned_pipeline_en_5.5.0_3.0_1727248606505.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_cased_finetuned_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_cased_finetuned_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_cased_finetuned_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.2 MB| + +## References + +https://huggingface.co/GusNicho/bert-base-cased-finetuned + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_french_spanish_portuguese_italian_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_french_spanish_portuguese_italian_cased_en.md new file mode 100644 index 00000000000000..7201288827bc68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_french_spanish_portuguese_italian_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_french_spanish_portuguese_italian_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_french_spanish_portuguese_italian_cased +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_french_spanish_portuguese_italian_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_french_spanish_portuguese_italian_cased_en_5.5.0_3.0_1727252787834.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_french_spanish_portuguese_italian_cased_en_5.5.0_3.0_1727252787834.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_french_spanish_portuguese_italian_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_french_spanish_portuguese_italian_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_french_spanish_portuguese_italian_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|444.7 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-fr-es-pt-it-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_french_spanish_portuguese_italian_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_french_spanish_portuguese_italian_cased_pipeline_en.md new file mode 100644 index 00000000000000..b93dc47b15e02f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_french_spanish_portuguese_italian_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_french_spanish_portuguese_italian_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_french_spanish_portuguese_italian_cased_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_french_spanish_portuguese_italian_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_french_spanish_portuguese_italian_cased_pipeline_en_5.5.0_3.0_1727252810824.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_french_spanish_portuguese_italian_cased_pipeline_en_5.5.0_3.0_1727252810824.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_french_spanish_portuguese_italian_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_french_spanish_portuguese_italian_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_french_spanish_portuguese_italian_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|445.2 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-fr-es-pt-it-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_greek_modern_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_greek_modern_cased_en.md new file mode 100644 index 00000000000000..149451b12b97ee --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_greek_modern_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_greek_modern_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_greek_modern_cased +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_greek_modern_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_greek_modern_cased_en_5.5.0_3.0_1727249089682.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_greek_modern_cased_en_5.5.0_3.0_1727249089682.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_greek_modern_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_greek_modern_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_greek_modern_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|408.4 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-el-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_greek_modern_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_greek_modern_cased_pipeline_en.md new file mode 100644 index 00000000000000..aa582455c436a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_greek_modern_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_greek_modern_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_greek_modern_cased_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_greek_modern_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_greek_modern_cased_pipeline_en_5.5.0_3.0_1727249112368.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_greek_modern_cased_pipeline_en_5.5.0_3.0_1727249112368.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_greek_modern_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_greek_modern_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_greek_modern_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.0 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-el-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_romanian_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_romanian_cased_en.md new file mode 100644 index 00000000000000..bf34f70ca72761 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_romanian_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_romanian_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_romanian_cased +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_romanian_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_romanian_cased_en_5.5.0_3.0_1727249389571.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_romanian_cased_en_5.5.0_3.0_1727249389571.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_romanian_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_romanian_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_romanian_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|413.3 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-ro-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_romanian_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_romanian_cased_pipeline_en.md new file mode 100644 index 00000000000000..ffa07fd104257d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_romanian_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_english_romanian_cased_pipeline pipeline BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_romanian_cased_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_romanian_cased_pipeline` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_romanian_cased_pipeline_en_5.5.0_3.0_1727249410986.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_romanian_cased_pipeline_en_5.5.0_3.0_1727249410986.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_english_romanian_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_english_romanian_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_romanian_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|413.8 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-ro-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_swahili_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_swahili_cased_en.md new file mode 100644 index 00000000000000..2f710361e7a051 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_swahili_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_swahili_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_swahili_cased +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_swahili_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_swahili_cased_en_5.5.0_3.0_1727252203082.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_swahili_cased_en_5.5.0_3.0_1727252203082.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_swahili_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_swahili_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_swahili_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|408.7 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-sw-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_urdu_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_urdu_cased_en.md new file mode 100644 index 00000000000000..92fe9b68604874 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_english_urdu_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_english_urdu_cased BertSentenceEmbeddings from Geotrend +author: John Snow Labs +name: sent_bert_base_english_urdu_cased +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_english_urdu_cased` is a English model originally trained by Geotrend. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_urdu_cased_en_5.5.0_3.0_1727256669262.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_english_urdu_cased_en_5.5.0_3.0_1727256669262.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_urdu_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_english_urdu_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_english_urdu_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|410.7 MB| + +## References + +https://huggingface.co/Geotrend/bert-base-en-ur-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_en.md new file mode 100644 index 00000000000000..d0933c512eddb1 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy BertSentenceEmbeddings from snousias +author: John Snow Labs +name: sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy` is a English model originally trained by snousias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_en_5.5.0_3.0_1727252727733.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_en_5.5.0_3.0_1727252727733.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|421.1 MB| + +## References + +https://huggingface.co/snousias/bert-base-greek-uncased-v5-finetuned-polylex-mg \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_pipeline_en.md new file mode 100644 index 00000000000000..d961b5455089e5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_pipeline pipeline BertSentenceEmbeddings from snousias +author: John Snow Labs +name: sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_pipeline` is a English model originally trained by snousias. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_pipeline_en_5.5.0_3.0_1727252749532.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_pipeline_en_5.5.0_3.0_1727252749532.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_greek_uncased_v5_finetuned_polylex_malagasy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|421.7 MB| + +## References + +https://huggingface.co/snousias/bert-base-greek-uncased-v5-finetuned-polylex-mg + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_macedonian_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_macedonian_cased_en.md new file mode 100644 index 00000000000000..24aa490575cdcd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_macedonian_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_macedonian_cased BertSentenceEmbeddings from anon-submission-mk +author: John Snow Labs +name: sent_bert_base_macedonian_cased +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_macedonian_cased` is a English model originally trained by anon-submission-mk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_macedonian_cased_en_5.5.0_3.0_1727251966901.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_macedonian_cased_en_5.5.0_3.0_1727251966901.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_macedonian_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_macedonian_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_macedonian_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|412.2 MB| + +## References + +https://huggingface.co/anon-submission-mk/bert-base-macedonian-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_macedonian_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_macedonian_cased_pipeline_en.md new file mode 100644 index 00000000000000..65d6c72fc58dab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_macedonian_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_macedonian_cased_pipeline pipeline BertSentenceEmbeddings from anon-submission-mk +author: John Snow Labs +name: sent_bert_base_macedonian_cased_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_macedonian_cased_pipeline` is a English model originally trained by anon-submission-mk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_macedonian_cased_pipeline_en_5.5.0_3.0_1727251988928.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_macedonian_cased_pipeline_en_5.5.0_3.0_1727251988928.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_macedonian_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_macedonian_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_macedonian_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.8 MB| + +## References + +https://huggingface.co/anon-submission-mk/bert-base-macedonian-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pipeline_pt.md new file mode 100644 index 00000000000000..1c01c6653f6339 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pipeline_pt.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Portuguese sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pipeline pipeline BertSentenceEmbeddings from Luciano +author: John Snow Labs +name: sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pipeline +date: 2024-09-25 +tags: [pt, open_source, pipeline, onnx] +task: Embeddings +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pipeline` is a Portuguese model originally trained by Luciano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pipeline_pt_5.5.0_3.0_1727252674645.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pipeline_pt_5.5.0_3.0_1727252674645.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|406.5 MB| + +## References + +https://huggingface.co/Luciano/bert-base-portuguese-cased-finetuned-tcu-acordaos + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pt.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pt.md new file mode 100644 index 00000000000000..b0ecccea9bde39 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pt.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Portuguese sent_bert_base_portuguese_cased_finetuned_tcu_acordaos BertSentenceEmbeddings from Luciano +author: John Snow Labs +name: sent_bert_base_portuguese_cased_finetuned_tcu_acordaos +date: 2024-09-25 +tags: [pt, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_portuguese_cased_finetuned_tcu_acordaos` is a Portuguese model originally trained by Luciano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pt_5.5.0_3.0_1727252653031.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_portuguese_cased_finetuned_tcu_acordaos_pt_5.5.0_3.0_1727252653031.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_portuguese_cased_finetuned_tcu_acordaos","pt") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_portuguese_cased_finetuned_tcu_acordaos","pt") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_portuguese_cased_finetuned_tcu_acordaos| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|pt| +|Size:|405.9 MB| + +## References + +https://huggingface.co/Luciano/bert-base-portuguese-cased-finetuned-tcu-acordaos \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_en.md new file mode 100644 index 00000000000000..7c19178fe4b1f0 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_spanish_wwm_cased_finetuned_literature_pro BertSentenceEmbeddings from a-v-bely +author: John Snow Labs +name: sent_bert_base_spanish_wwm_cased_finetuned_literature_pro +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_spanish_wwm_cased_finetuned_literature_pro` is a English model originally trained by a-v-bely. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_en_5.5.0_3.0_1727251984291.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_en_5.5.0_3.0_1727251984291.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_spanish_wwm_cased_finetuned_literature_pro","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_spanish_wwm_cased_finetuned_literature_pro","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_spanish_wwm_cased_finetuned_literature_pro| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/a-v-bely/bert-base-spanish-wwm-cased-finetuned-literature-pro \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_pipeline_en.md new file mode 100644 index 00000000000000..68fce6150841f7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_pipeline pipeline BertSentenceEmbeddings from a-v-bely +author: John Snow Labs +name: sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_pipeline` is a English model originally trained by a-v-bely. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_pipeline_en_5.5.0_3.0_1727252006287.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_pipeline_en_5.5.0_3.0_1727252006287.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_spanish_wwm_cased_finetuned_literature_pro_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|410.0 MB| + +## References + +https://huggingface.co/a-v-bely/bert-base-spanish-wwm-cased-finetuned-literature-pro + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_1802_r1_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_1802_r1_en.md new file mode 100644 index 00000000000000..45977fbe03d8e7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_1802_r1_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_1802_r1 BertSentenceEmbeddings from JamesKim +author: John Snow Labs +name: sent_bert_base_uncased_1802_r1 +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_1802_r1` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_r1_en_5.5.0_3.0_1727252925708.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_r1_en_5.5.0_3.0_1727252925708.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_1802_r1","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_1802_r1","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_1802_r1| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/JamesKim/bert-base-uncased_1802_r1 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_1802_r1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_1802_r1_pipeline_en.md new file mode 100644 index 00000000000000..bb25b0e09ec5e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_1802_r1_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_1802_r1_pipeline pipeline BertSentenceEmbeddings from JamesKim +author: John Snow Labs +name: sent_bert_base_uncased_1802_r1_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_1802_r1_pipeline` is a English model originally trained by JamesKim. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_r1_pipeline_en_5.5.0_3.0_1727252950127.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_1802_r1_pipeline_en_5.5.0_3.0_1727252950127.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_1802_r1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_1802_r1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_1802_r1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/JamesKim/bert-base-uncased_1802_r1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_en.md new file mode 100644 index 00000000000000..caea42250cb8b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_wallisian BertSentenceEmbeddings from btamm12 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_wallisian +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_wallisian` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_en_5.5.0_3.0_1727248679997.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_en_5.5.0_3.0_1727248679997.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wallisian","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wallisian","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_wallisian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/btamm12/bert-base-uncased-finetuned-wls \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_en.md new file mode 100644 index 00000000000000..37b27e5e2888cc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower BertSentenceEmbeddings from btamm12 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_en_5.5.0_3.0_1727252904970.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_en_5.5.0_3.0_1727252904970.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/btamm12/bert-base-uncased-finetuned-wls-manual-6ep-lower \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_pipeline_en.md new file mode 100644 index 00000000000000..20fcaece26fde7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_pipeline pipeline BertSentenceEmbeddings from btamm12 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_pipeline_en_5.5.0_3.0_1727252927351.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_pipeline_en_5.5.0_3.0_1727252927351.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_wallisian_manual_6ep_lower_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/btamm12/bert-base-uncased-finetuned-wls-manual-6ep-lower + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_en.md new file mode 100644 index 00000000000000..ff63beeff82825 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower BertSentenceEmbeddings from btamm12 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_en_5.5.0_3.0_1727248768771.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_en_5.5.0_3.0_1727248768771.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/btamm12/bert-base-uncased-finetuned-wls-manual-7ep-lower \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_pipeline_en.md new file mode 100644 index 00000000000000..56b038e7bc33b3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_pipeline pipeline BertSentenceEmbeddings from btamm12 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_pipeline_en_5.5.0_3.0_1727248790865.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_pipeline_en_5.5.0_3.0_1727248790865.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_wallisian_manual_7ep_lower_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/btamm12/bert-base-uncased-finetuned-wls-manual-7ep-lower + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_pipeline_en.md new file mode 100644 index 00000000000000..349071eccba737 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_finetuned_wallisian_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_finetuned_wallisian_pipeline pipeline BertSentenceEmbeddings from btamm12 +author: John Snow Labs +name: sent_bert_base_uncased_finetuned_wallisian_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_finetuned_wallisian_pipeline` is a English model originally trained by btamm12. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_pipeline_en_5.5.0_3.0_1727248701470.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_finetuned_wallisian_pipeline_en_5.5.0_3.0_1727248701470.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_finetuned_wallisian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_finetuned_wallisian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_finetuned_wallisian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/btamm12/bert-base-uncased-finetuned-wls + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_haesun_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_haesun_en.md new file mode 100644 index 00000000000000..de2d6069cf6ffc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_haesun_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_haesun BertSentenceEmbeddings from haesun +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_haesun +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_haesun` is a English model originally trained by haesun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_haesun_en_5.5.0_3.0_1727230745218.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_haesun_en_5.5.0_3.0_1727230745218.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_haesun","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_haesun","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_haesun| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/haesun/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_haesun_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_haesun_pipeline_en.md new file mode 100644 index 00000000000000..7338c5cc86c0f9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_haesun_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_haesun_pipeline pipeline BertSentenceEmbeddings from haesun +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_haesun_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_haesun_pipeline` is a English model originally trained by haesun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_haesun_pipeline_en_5.5.0_3.0_1727230765829.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_haesun_pipeline_en_5.5.0_3.0_1727230765829.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_haesun_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_haesun_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_haesun_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/haesun/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_mabrouk_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_mabrouk_pipeline_en.md new file mode 100644 index 00000000000000..75f207533e8563 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_mabrouk_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_mabrouk_pipeline pipeline BertSentenceEmbeddings from mabrouk +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_mabrouk_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_mabrouk_pipeline` is a English model originally trained by mabrouk. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_mabrouk_pipeline_en_5.5.0_3.0_1727251458188.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_mabrouk_pipeline_en_5.5.0_3.0_1727251458188.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_mabrouk_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_mabrouk_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_mabrouk_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/mabrouk/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_robinsh2023_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_robinsh2023_en.md new file mode 100644 index 00000000000000..25bbd73ed86536 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_robinsh2023_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_robinsh2023 BertSentenceEmbeddings from Robinsh2023 +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_robinsh2023 +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_robinsh2023` is a English model originally trained by Robinsh2023. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_robinsh2023_en_5.5.0_3.0_1727234930301.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_robinsh2023_en_5.5.0_3.0_1727234930301.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_robinsh2023","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_issues_128_robinsh2023","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_robinsh2023| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.1 MB| + +## References + +https://huggingface.co/Robinsh2023/bert-base-uncased-issues-128 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_robinsh2023_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_robinsh2023_pipeline_en.md new file mode 100644 index 00000000000000..98eea9f8c85640 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_issues_128_robinsh2023_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_issues_128_robinsh2023_pipeline pipeline BertSentenceEmbeddings from Robinsh2023 +author: John Snow Labs +name: sent_bert_base_uncased_issues_128_robinsh2023_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_issues_128_robinsh2023_pipeline` is a English model originally trained by Robinsh2023. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_robinsh2023_pipeline_en_5.5.0_3.0_1727234951381.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_issues_128_robinsh2023_pipeline_en_5.5.0_3.0_1727234951381.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_issues_128_robinsh2023_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_issues_128_robinsh2023_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_issues_128_robinsh2023_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/Robinsh2023/bert-base-uncased-issues-128 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_malayalam_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_malayalam_en.md new file mode 100644 index 00000000000000..6fa96082c9f1ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_malayalam_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_malayalam BertSentenceEmbeddings from Tural +author: John Snow Labs +name: sent_bert_base_uncased_malayalam +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_malayalam` is a English model originally trained by Tural. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_malayalam_en_5.5.0_3.0_1727253078814.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_malayalam_en_5.5.0_3.0_1727253078814.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_malayalam","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_malayalam","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_malayalam| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.9 MB| + +## References + +https://huggingface.co/Tural/bert-base-uncased-ml \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_malayalam_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_malayalam_pipeline_en.md new file mode 100644 index 00000000000000..262f1e0f5a523f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_malayalam_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_malayalam_pipeline pipeline BertSentenceEmbeddings from Tural +author: John Snow Labs +name: sent_bert_base_uncased_malayalam_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_malayalam_pipeline` is a English model originally trained by Tural. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_malayalam_pipeline_en_5.5.0_3.0_1727253099895.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_malayalam_pipeline_en_5.5.0_3.0_1727253099895.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_malayalam_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_malayalam_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_malayalam_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.4 MB| + +## References + +https://huggingface.co/Tural/bert-base-uncased-ml + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_en.md new file mode 100644 index 00000000000000..a90a843650af30 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_mlm_scirepeval_fos_chemistry BertSentenceEmbeddings from jonas-luehrs +author: John Snow Labs +name: sent_bert_base_uncased_mlm_scirepeval_fos_chemistry +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_mlm_scirepeval_fos_chemistry` is a English model originally trained by jonas-luehrs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_en_5.5.0_3.0_1727234682485.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_en_5.5.0_3.0_1727234682485.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_mlm_scirepeval_fos_chemistry","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_mlm_scirepeval_fos_chemistry","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_mlm_scirepeval_fos_chemistry| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/jonas-luehrs/bert-base-uncased-MLM-scirepeval_fos_chemistry \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_pipeline_en.md new file mode 100644 index 00000000000000..5853799494ddbc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_pipeline pipeline BertSentenceEmbeddings from jonas-luehrs +author: John Snow Labs +name: sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_pipeline` is a English model originally trained by jonas-luehrs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_pipeline_en_5.5.0_3.0_1727234703495.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_pipeline_en_5.5.0_3.0_1727234703495.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_mlm_scirepeval_fos_chemistry_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/jonas-luehrs/bert-base-uncased-MLM-scirepeval_fos_chemistry + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_model_attribution_challenge_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_model_attribution_challenge_en.md new file mode 100644 index 00000000000000..db7d519bf1b884 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_model_attribution_challenge_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_model_attribution_challenge BertSentenceEmbeddings from model-attribution-challenge +author: John Snow Labs +name: sent_bert_base_uncased_model_attribution_challenge +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_model_attribution_challenge` is a English model originally trained by model-attribution-challenge. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_model_attribution_challenge_en_5.5.0_3.0_1727252027969.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_model_attribution_challenge_en_5.5.0_3.0_1727252027969.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_model_attribution_challenge","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_model_attribution_challenge","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_model_attribution_challenge| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/model-attribution-challenge/bert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_sclarge_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_sclarge_pipeline_en.md new file mode 100644 index 00000000000000..2e429c80c582ca --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_sclarge_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_sclarge_pipeline pipeline BertSentenceEmbeddings from CambridgeMolecularEngineering +author: John Snow Labs +name: sent_bert_base_uncased_sclarge_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_sclarge_pipeline` is a English model originally trained by CambridgeMolecularEngineering. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_sclarge_pipeline_en_5.5.0_3.0_1727230881067.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_sclarge_pipeline_en_5.5.0_3.0_1727230881067.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_sclarge_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_sclarge_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_sclarge_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/CambridgeMolecularEngineering/bert-base-uncased-sclarge + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_sijia_w_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_sijia_w_en.md new file mode 100644 index 00000000000000..8b3e59b073fb20 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_sijia_w_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_uncased_sijia_w BertSentenceEmbeddings from sijia-w +author: John Snow Labs +name: sent_bert_base_uncased_sijia_w +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_sijia_w` is a English model originally trained by sijia-w. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_sijia_w_en_5.5.0_3.0_1727234344591.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_sijia_w_en_5.5.0_3.0_1727234344591.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_sijia_w","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_uncased_sijia_w","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_sijia_w| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/sijia-w/bert-base-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_sijia_w_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_sijia_w_pipeline_en.md new file mode 100644 index 00000000000000..4c43e76264b8ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_uncased_sijia_w_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_base_uncased_sijia_w_pipeline pipeline BertSentenceEmbeddings from sijia-w +author: John Snow Labs +name: sent_bert_base_uncased_sijia_w_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_uncased_sijia_w_pipeline` is a English model originally trained by sijia-w. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_sijia_w_pipeline_en_5.5.0_3.0_1727234365461.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_uncased_sijia_w_pipeline_en_5.5.0_3.0_1727234365461.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_base_uncased_sijia_w_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_base_uncased_sijia_w_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_uncased_sijia_w_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/sijia-w/bert-base-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_vn_finetuned_portuguese_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_vn_finetuned_portuguese_en.md new file mode 100644 index 00000000000000..c4e012585d4772 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_base_vn_finetuned_portuguese_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_base_vn_finetuned_portuguese BertSentenceEmbeddings from dotansang +author: John Snow Labs +name: sent_bert_base_vn_finetuned_portuguese +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_base_vn_finetuned_portuguese` is a English model originally trained by dotansang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_base_vn_finetuned_portuguese_en_5.5.0_3.0_1727248589931.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_base_vn_finetuned_portuguese_en_5.5.0_3.0_1727248589931.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_vn_finetuned_portuguese","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_base_vn_finetuned_portuguese","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_base_vn_finetuned_portuguese| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|498.8 MB| + +## References + +https://huggingface.co/dotansang/bert-base-vn-finetuned-pt \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_cased_en.md new file mode 100644 index 00000000000000..ba7aa35d488d88 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_cased_en.md @@ -0,0 +1,92 @@ +--- +layout: model +title: BERT Sentence Embeddings (Large Cased) +author: John Snow Labs +name: sent_bert_large_cased +date: 2024-09-25 +tags: [open_source, embeddings, en, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This model contains a deep bidirectional transformer trained on Wikipedia and the BookCorpus. The details are described in the paper "[BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding](https://arxiv.org/abs/1810.04805)". + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_en_5.5.0_3.0_1727230647828.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_en_5.5.0_3.0_1727230647828.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +... +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_cased", "en") \ +.setInputCols("sentence") \ +.setOutputCol("sentence_embeddings") +nlp_pipeline = Pipeline(stages=[document_assembler, sentence_detector, embeddings]) +pipeline_model = nlp_pipeline.fit(spark.createDataFrame([[""]]).toDF("text")) +result = pipeline_model.transform(spark.createDataFrame([['I hate cancer', "Antibiotics aren't painkiller"]], ["text"])) +``` +```scala +... +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_cased", "en") +.setInputCols("sentence") +.setOutputCol("sentence_embeddings") +val pipeline = new Pipeline().setStages(Array(document_assembler, sentence_detector, embeddings)) +val data = Seq("I hate cancer", "Antibiotics aren't painkiller").toDF("text") +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu + +text = ["I hate cancer", "Antibiotics aren't painkiller"] +embeddings_df = nlu.load('en.embed_sentence.bert_large_cased').predict(text, output_level='sentence') +embeddings_df +``` +
+ +## Results + +```bash + + token en_embed_sentence_bert_large_cased_embeddings + + I [[-0.6228358149528503, -0.3453695774078369, 0.... +love [[-0.6228358149528503, -0.3453695774078369, 0.... +NLP [[-0.6228358149528503, -0.3453695774078369, 0.... +``` + +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.2 GB| \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_cased_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_cased_pipeline_en.md new file mode 100644 index 00000000000000..6dcf322294a214 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_cased_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_large_cased_pipeline pipeline BertSentenceEmbeddings from google-bert +author: John Snow Labs +name: sent_bert_large_cased_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_cased_pipeline` is a English model originally trained by google-bert. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_pipeline_en_5.5.0_3.0_1727230709316.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_cased_pipeline_en_5.5.0_3.0_1727230709316.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_large_cased_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_large_cased_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_cased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/google-bert/bert-large-cased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_nli_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_nli_en.md new file mode 100644 index 00000000000000..1cc86107e298c9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_nli_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bert_large_nli BertSentenceEmbeddings from binwang +author: John Snow Labs +name: sent_bert_large_nli +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_nli` is a English model originally trained by binwang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_nli_en_5.5.0_3.0_1727251695723.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_nli_en_5.5.0_3.0_1727251695723.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_nli","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_large_nli","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_nli| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/binwang/bert-large-nli \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_nli_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_nli_pipeline_en.md new file mode 100644 index 00000000000000..df3914a2b6a4e8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_nli_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_bert_large_nli_pipeline pipeline BertSentenceEmbeddings from binwang +author: John Snow Labs +name: sent_bert_large_nli_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_nli_pipeline` is a English model originally trained by binwang. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_nli_pipeline_en_5.5.0_3.0_1727251760270.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_nli_pipeline_en_5.5.0_3.0_1727251760270.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_large_nli_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_large_nli_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_nli_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/binwang/bert-large-nli + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_portuguese_cased_legal_tsdae_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_portuguese_cased_legal_tsdae_pipeline_pt.md new file mode 100644 index 00000000000000..66d33ec146f0dd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_large_portuguese_cased_legal_tsdae_pipeline_pt.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Portuguese sent_bert_large_portuguese_cased_legal_tsdae_pipeline pipeline BertSentenceEmbeddings from stjiris +author: John Snow Labs +name: sent_bert_large_portuguese_cased_legal_tsdae_pipeline +date: 2024-09-25 +tags: [pt, open_source, pipeline, onnx] +task: Embeddings +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_large_portuguese_cased_legal_tsdae_pipeline` is a Portuguese model originally trained by stjiris. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_large_portuguese_cased_legal_tsdae_pipeline_pt_5.5.0_3.0_1727253371409.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_large_portuguese_cased_legal_tsdae_pipeline_pt_5.5.0_3.0_1727253371409.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_large_portuguese_cased_legal_tsdae_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_large_portuguese_cased_legal_tsdae_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_large_portuguese_cased_legal_tsdae_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|1.2 GB| + +## References + +https://huggingface.co/stjiris/bert-large-portuguese-cased-legal-tsdae + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_tagalog_base_uncased_wwm_pipeline_tl.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_tagalog_base_uncased_wwm_pipeline_tl.md new file mode 100644 index 00000000000000..d454582e01fe82 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_tagalog_base_uncased_wwm_pipeline_tl.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Tagalog sent_bert_tagalog_base_uncased_wwm_pipeline pipeline BertSentenceEmbeddings from jcblaise +author: John Snow Labs +name: sent_bert_tagalog_base_uncased_wwm_pipeline +date: 2024-09-25 +tags: [tl, open_source, pipeline, onnx] +task: Embeddings +language: tl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_tagalog_base_uncased_wwm_pipeline` is a Tagalog model originally trained by jcblaise. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_tagalog_base_uncased_wwm_pipeline_tl_5.5.0_3.0_1727249292087.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_tagalog_base_uncased_wwm_pipeline_tl_5.5.0_3.0_1727249292087.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_bert_tagalog_base_uncased_wwm_pipeline", lang = "tl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_bert_tagalog_base_uncased_wwm_pipeline", lang = "tl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_tagalog_base_uncased_wwm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|tl| +|Size:|407.4 MB| + +## References + +https://huggingface.co/jcblaise/bert-tagalog-base-uncased-WWM + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bert_tagalog_base_uncased_wwm_tl.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_tagalog_base_uncased_wwm_tl.md new file mode 100644 index 00000000000000..d51dd28aebdf2d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bert_tagalog_base_uncased_wwm_tl.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Tagalog sent_bert_tagalog_base_uncased_wwm BertSentenceEmbeddings from jcblaise +author: John Snow Labs +name: sent_bert_tagalog_base_uncased_wwm +date: 2024-09-25 +tags: [tl, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: tl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bert_tagalog_base_uncased_wwm` is a Tagalog model originally trained by jcblaise. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bert_tagalog_base_uncased_wwm_tl_5.5.0_3.0_1727249270927.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bert_tagalog_base_uncased_wwm_tl_5.5.0_3.0_1727249270927.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bert_tagalog_base_uncased_wwm","tl") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bert_tagalog_base_uncased_wwm","tl") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bert_tagalog_base_uncased_wwm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|tl| +|Size:|406.9 MB| + +## References + +https://huggingface.co/jcblaise/bert-tagalog-base-uncased-WWM \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_bio_bert_base_spanish_wwm_cased_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_bio_bert_base_spanish_wwm_cased_en.md new file mode 100644 index 00000000000000..e6bfe710d33d2b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_bio_bert_base_spanish_wwm_cased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_bio_bert_base_spanish_wwm_cased BertSentenceEmbeddings from mrojas +author: John Snow Labs +name: sent_bio_bert_base_spanish_wwm_cased +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_bio_bert_base_spanish_wwm_cased` is a English model originally trained by mrojas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_bio_bert_base_spanish_wwm_cased_en_5.5.0_3.0_1727252456840.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_bio_bert_base_spanish_wwm_cased_en_5.5.0_3.0_1727252456840.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_bio_bert_base_spanish_wwm_cased","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_bio_bert_base_spanish_wwm_cased","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_bio_bert_base_spanish_wwm_cased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|409.0 MB| + +## References + +https://huggingface.co/mrojas/bio-bert-base-spanish-wwm-cased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_biobert_italian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_biobert_italian_pipeline_en.md new file mode 100644 index 00000000000000..2fb0f7dc3d0159 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_biobert_italian_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_biobert_italian_pipeline pipeline BertSentenceEmbeddings from marcopost-it +author: John Snow Labs +name: sent_biobert_italian_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_biobert_italian_pipeline` is a English model originally trained by marcopost-it. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_biobert_italian_pipeline_en_5.5.0_3.0_1727249186643.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_biobert_italian_pipeline_en_5.5.0_3.0_1727249186643.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_biobert_italian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_biobert_italian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_biobert_italian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|410.8 MB| + +## References + +https://huggingface.co/marcopost-it/biobert-it + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_cl_arabertv0_1_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_cl_arabertv0_1_base_pipeline_en.md new file mode 100644 index 00000000000000..3eaec1d0fe89c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_cl_arabertv0_1_base_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_cl_arabertv0_1_base_pipeline pipeline BertSentenceEmbeddings from qahq +author: John Snow Labs +name: sent_cl_arabertv0_1_base_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_cl_arabertv0_1_base_pipeline` is a English model originally trained by qahq. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_cl_arabertv0_1_base_pipeline_en_5.5.0_3.0_1727251599249.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_cl_arabertv0_1_base_pipeline_en_5.5.0_3.0_1727251599249.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_cl_arabertv0_1_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_cl_arabertv0_1_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_cl_arabertv0_1_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|505.6 MB| + +## References + +https://huggingface.co/qahq/CL-AraBERTv0.1-base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_en.md new file mode 100644 index 00000000000000..891717c5f332b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb BertSentenceEmbeddings from cxfajar197 +author: John Snow Labs +name: sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb` is a English model originally trained by cxfajar197. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_en_5.5.0_3.0_1727234600526.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_en_5.5.0_3.0_1727234600526.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|665.1 MB| + +## References + +https://huggingface.co/cxfajar197/distilbert-base-uncased-finetuned-imdb-accelerate-finetuned-imdb \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_pipeline_en.md new file mode 100644 index 00000000000000..a42686d8627174 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_pipeline pipeline BertSentenceEmbeddings from cxfajar197 +author: John Snow Labs +name: sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_pipeline` is a English model originally trained by cxfajar197. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_pipeline_en_5.5.0_3.0_1727234634556.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_pipeline_en_5.5.0_3.0_1727234634556.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_distilbert_base_uncased_finetuned_imdb_accelerate_finetuned_imdb_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|665.6 MB| + +## References + +https://huggingface.co/cxfajar197/distilbert-base-uncased-finetuned-imdb-accelerate-finetuned-imdb + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_dummy_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_dummy_pipeline_en.md new file mode 100644 index 00000000000000..dfa357963c3cfb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_dummy_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_dummy_pipeline pipeline BertSentenceEmbeddings from knight7561 +author: John Snow Labs +name: sent_dummy_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_dummy_pipeline` is a English model originally trained by knight7561. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_dummy_pipeline_en_5.5.0_3.0_1727252621212.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_dummy_pipeline_en_5.5.0_3.0_1727252621212.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_dummy_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_dummy_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_dummy_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/knight7561/dummy + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_fae_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_fae_en.md new file mode 100644 index 00000000000000..93fe1269803402 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_fae_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_fae BertSentenceEmbeddings from sereneWithU +author: John Snow Labs +name: sent_fae +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_fae` is a English model originally trained by sereneWithU. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_fae_en_5.5.0_3.0_1727252735434.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_fae_en_5.5.0_3.0_1727252735434.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_fae","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_fae","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_fae| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/sereneWithU/FAE \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_fae_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_fae_pipeline_en.md new file mode 100644 index 00000000000000..035a0338e697a2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_fae_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_fae_pipeline pipeline BertSentenceEmbeddings from sereneWithU +author: John Snow Labs +name: sent_fae_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_fae_pipeline` is a English model originally trained by sereneWithU. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_fae_pipeline_en_5.5.0_3.0_1727252808462.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_fae_pipeline_en_5.5.0_3.0_1727252808462.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_fae_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_fae_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_fae_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.2 GB| + +## References + +https://huggingface.co/sereneWithU/FAE + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_fine_tune_bert_mlm_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_fine_tune_bert_mlm_en.md new file mode 100644 index 00000000000000..840b4446718487 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_fine_tune_bert_mlm_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_fine_tune_bert_mlm BertSentenceEmbeddings from mjavadmt +author: John Snow Labs +name: sent_fine_tune_bert_mlm +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_fine_tune_bert_mlm` is a English model originally trained by mjavadmt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_fine_tune_bert_mlm_en_5.5.0_3.0_1727230529238.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_fine_tune_bert_mlm_en_5.5.0_3.0_1727230529238.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_fine_tune_bert_mlm","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_fine_tune_bert_mlm","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_fine_tune_bert_mlm| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|606.4 MB| + +## References + +https://huggingface.co/mjavadmt/fine-tune-BERT-MLM \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_fine_tune_bert_mlm_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_fine_tune_bert_mlm_pipeline_en.md new file mode 100644 index 00000000000000..24027cf165f4da --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_fine_tune_bert_mlm_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_fine_tune_bert_mlm_pipeline pipeline BertSentenceEmbeddings from mjavadmt +author: John Snow Labs +name: sent_fine_tune_bert_mlm_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_fine_tune_bert_mlm_pipeline` is a English model originally trained by mjavadmt. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_fine_tune_bert_mlm_pipeline_en_5.5.0_3.0_1727230560544.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_fine_tune_bert_mlm_pipeline_en_5.5.0_3.0_1727230560544.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_fine_tune_bert_mlm_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_fine_tune_bert_mlm_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_fine_tune_bert_mlm_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|607.0 MB| + +## References + +https://huggingface.co/mjavadmt/fine-tune-BERT-MLM + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_guidebias_bert_base_uncased_gender_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_guidebias_bert_base_uncased_gender_en.md new file mode 100644 index 00000000000000..84844671ddba3f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_guidebias_bert_base_uncased_gender_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_guidebias_bert_base_uncased_gender BertSentenceEmbeddings from squiduu +author: John Snow Labs +name: sent_guidebias_bert_base_uncased_gender +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_guidebias_bert_base_uncased_gender` is a English model originally trained by squiduu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_guidebias_bert_base_uncased_gender_en_5.5.0_3.0_1727230418497.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_guidebias_bert_base_uncased_gender_en_5.5.0_3.0_1727230418497.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_guidebias_bert_base_uncased_gender","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_guidebias_bert_base_uncased_gender","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_guidebias_bert_base_uncased_gender| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/squiduu/guidebias-bert-base-uncased-gender \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_gujibert_jian_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_gujibert_jian_pipeline_en.md new file mode 100644 index 00000000000000..a9492e20a36300 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_gujibert_jian_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_gujibert_jian_pipeline pipeline BertSentenceEmbeddings from hsc748NLP +author: John Snow Labs +name: sent_gujibert_jian_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_gujibert_jian_pipeline` is a English model originally trained by hsc748NLP. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_gujibert_jian_pipeline_en_5.5.0_3.0_1727252975988.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_gujibert_jian_pipeline_en_5.5.0_3.0_1727252975988.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_gujibert_jian_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_gujibert_jian_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_gujibert_jian_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|420.8 MB| + +## References + +https://huggingface.co/hsc748NLP/GujiBERT_jian + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_marathi_dev_bert_hi.md b/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_marathi_dev_bert_hi.md new file mode 100644 index 00000000000000..4a418b158b4128 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_marathi_dev_bert_hi.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Hindi sent_hindi_marathi_dev_bert BertSentenceEmbeddings from l3cube-pune +author: John Snow Labs +name: sent_hindi_marathi_dev_bert +date: 2024-09-25 +tags: [hi, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hindi_marathi_dev_bert` is a Hindi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hindi_marathi_dev_bert_hi_5.5.0_3.0_1727252023290.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hindi_marathi_dev_bert_hi_5.5.0_3.0_1727252023290.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_hindi_marathi_dev_bert","hi") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_hindi_marathi_dev_bert","hi") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hindi_marathi_dev_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|hi| +|Size:|890.7 MB| + +## References + +https://huggingface.co/l3cube-pune/hindi-marathi-dev-bert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_marathi_dev_bert_pipeline_hi.md b/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_marathi_dev_bert_pipeline_hi.md new file mode 100644 index 00000000000000..efc034c3aeb5ce --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_marathi_dev_bert_pipeline_hi.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Hindi sent_hindi_marathi_dev_bert_pipeline pipeline BertSentenceEmbeddings from l3cube-pune +author: John Snow Labs +name: sent_hindi_marathi_dev_bert_pipeline +date: 2024-09-25 +tags: [hi, open_source, pipeline, onnx] +task: Embeddings +language: hi +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hindi_marathi_dev_bert_pipeline` is a Hindi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hindi_marathi_dev_bert_pipeline_hi_5.5.0_3.0_1727252074552.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hindi_marathi_dev_bert_pipeline_hi_5.5.0_3.0_1727252074552.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_hindi_marathi_dev_bert_pipeline", lang = "hi") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_hindi_marathi_dev_bert_pipeline", lang = "hi") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hindi_marathi_dev_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|hi| +|Size:|891.2 MB| + +## References + +https://huggingface.co/l3cube-pune/hindi-marathi-dev-bert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_wordpiece_bert_test_2m_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_wordpiece_bert_test_2m_en.md new file mode 100644 index 00000000000000..6ba71c52633d2a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_wordpiece_bert_test_2m_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_hindi_wordpiece_bert_test_2m BertSentenceEmbeddings from rg1683 +author: John Snow Labs +name: sent_hindi_wordpiece_bert_test_2m +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hindi_wordpiece_bert_test_2m` is a English model originally trained by rg1683. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hindi_wordpiece_bert_test_2m_en_5.5.0_3.0_1727249119712.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hindi_wordpiece_bert_test_2m_en_5.5.0_3.0_1727249119712.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_hindi_wordpiece_bert_test_2m","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_hindi_wordpiece_bert_test_2m","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hindi_wordpiece_bert_test_2m| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|377.7 MB| + +## References + +https://huggingface.co/rg1683/hindi_wordpiece_bert_test_2m \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_wordpiece_bert_test_2m_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_wordpiece_bert_test_2m_pipeline_en.md new file mode 100644 index 00000000000000..333a4b450b0cda --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_hindi_wordpiece_bert_test_2m_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_hindi_wordpiece_bert_test_2m_pipeline pipeline BertSentenceEmbeddings from rg1683 +author: John Snow Labs +name: sent_hindi_wordpiece_bert_test_2m_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_hindi_wordpiece_bert_test_2m_pipeline` is a English model originally trained by rg1683. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_hindi_wordpiece_bert_test_2m_pipeline_en_5.5.0_3.0_1727249140694.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_hindi_wordpiece_bert_test_2m_pipeline_en_5.5.0_3.0_1727249140694.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_hindi_wordpiece_bert_test_2m_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_hindi_wordpiece_bert_test_2m_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_hindi_wordpiece_bert_test_2m_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|378.3 MB| + +## References + +https://huggingface.co/rg1683/hindi_wordpiece_bert_test_2m + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_javanese_bert_small_jv.md b/docs/_posts/ahmedlone127/2024-09-25-sent_javanese_bert_small_jv.md new file mode 100644 index 00000000000000..05cb9c67205aa3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_javanese_bert_small_jv.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Javanese sent_javanese_bert_small BertSentenceEmbeddings from w11wo +author: John Snow Labs +name: sent_javanese_bert_small +date: 2024-09-25 +tags: [jv, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: jv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_javanese_bert_small` is a Javanese model originally trained by w11wo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_javanese_bert_small_jv_5.5.0_3.0_1727251545296.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_javanese_bert_small_jv_5.5.0_3.0_1727251545296.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_javanese_bert_small","jv") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_javanese_bert_small","jv") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_javanese_bert_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|jv| +|Size:|407.3 MB| + +## References + +https://huggingface.co/w11wo/javanese-bert-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_javanese_bert_small_pipeline_jv.md b/docs/_posts/ahmedlone127/2024-09-25-sent_javanese_bert_small_pipeline_jv.md new file mode 100644 index 00000000000000..4042c91c5e530a --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_javanese_bert_small_pipeline_jv.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Javanese sent_javanese_bert_small_pipeline pipeline BertSentenceEmbeddings from w11wo +author: John Snow Labs +name: sent_javanese_bert_small_pipeline +date: 2024-09-25 +tags: [jv, open_source, pipeline, onnx] +task: Embeddings +language: jv +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_javanese_bert_small_pipeline` is a Javanese model originally trained by w11wo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_javanese_bert_small_pipeline_jv_5.5.0_3.0_1727251567026.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_javanese_bert_small_pipeline_jv_5.5.0_3.0_1727251567026.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_javanese_bert_small_pipeline", lang = "jv") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_javanese_bert_small_pipeline", lang = "jv") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_javanese_bert_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|jv| +|Size:|407.8 MB| + +## References + +https://huggingface.co/w11wo/javanese-bert-small + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_knowbias_bert_base_uncased_gender_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_knowbias_bert_base_uncased_gender_en.md new file mode 100644 index 00000000000000..240f0bb0049c7f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_knowbias_bert_base_uncased_gender_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_knowbias_bert_base_uncased_gender BertSentenceEmbeddings from squiduu +author: John Snow Labs +name: sent_knowbias_bert_base_uncased_gender +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_knowbias_bert_base_uncased_gender` is a English model originally trained by squiduu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_knowbias_bert_base_uncased_gender_en_5.5.0_3.0_1727252460548.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_knowbias_bert_base_uncased_gender_en_5.5.0_3.0_1727252460548.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_knowbias_bert_base_uncased_gender","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_knowbias_bert_base_uncased_gender","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_knowbias_bert_base_uncased_gender| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/squiduu/knowbias-bert-base-uncased-gender \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_knowbias_bert_base_uncased_gender_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_knowbias_bert_base_uncased_gender_pipeline_en.md new file mode 100644 index 00000000000000..edc3ba98ab60e3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_knowbias_bert_base_uncased_gender_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_knowbias_bert_base_uncased_gender_pipeline pipeline BertSentenceEmbeddings from squiduu +author: John Snow Labs +name: sent_knowbias_bert_base_uncased_gender_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_knowbias_bert_base_uncased_gender_pipeline` is a English model originally trained by squiduu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_knowbias_bert_base_uncased_gender_pipeline_en_5.5.0_3.0_1727252482647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_knowbias_bert_base_uncased_gender_pipeline_en_5.5.0_3.0_1727252482647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_knowbias_bert_base_uncased_gender_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_knowbias_bert_base_uncased_gender_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_knowbias_bert_base_uncased_gender_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/squiduu/knowbias-bert-base-uncased-gender + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_marathi_bert_smaller_mr.md b/docs/_posts/ahmedlone127/2024-09-25-sent_marathi_bert_smaller_mr.md new file mode 100644 index 00000000000000..b2b5934f5b7b99 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_marathi_bert_smaller_mr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Marathi sent_marathi_bert_smaller BertSentenceEmbeddings from l3cube-pune +author: John Snow Labs +name: sent_marathi_bert_smaller +date: 2024-09-25 +tags: [mr, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: mr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_marathi_bert_smaller` is a Marathi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_marathi_bert_smaller_mr_5.5.0_3.0_1727252791211.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_marathi_bert_smaller_mr_5.5.0_3.0_1727252791211.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_marathi_bert_smaller","mr") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_marathi_bert_smaller","mr") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_marathi_bert_smaller| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|mr| +|Size:|204.9 MB| + +## References + +https://huggingface.co/l3cube-pune/marathi-bert-smaller \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_marathi_bert_smaller_pipeline_mr.md b/docs/_posts/ahmedlone127/2024-09-25-sent_marathi_bert_smaller_pipeline_mr.md new file mode 100644 index 00000000000000..65572f04a07685 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_marathi_bert_smaller_pipeline_mr.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Marathi sent_marathi_bert_smaller_pipeline pipeline BertSentenceEmbeddings from l3cube-pune +author: John Snow Labs +name: sent_marathi_bert_smaller_pipeline +date: 2024-09-25 +tags: [mr, open_source, pipeline, onnx] +task: Embeddings +language: mr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_marathi_bert_smaller_pipeline` is a Marathi model originally trained by l3cube-pune. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_marathi_bert_smaller_pipeline_mr_5.5.0_3.0_1727252805395.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_marathi_bert_smaller_pipeline_mr_5.5.0_3.0_1727252805395.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_marathi_bert_smaller_pipeline", lang = "mr") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_marathi_bert_smaller_pipeline", lang = "mr") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_marathi_bert_smaller_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|mr| +|Size:|205.4 MB| + +## References + +https://huggingface.co/l3cube-pune/marathi-bert-smaller + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_medruberttiny2_ru.md b/docs/_posts/ahmedlone127/2024-09-25-sent_medruberttiny2_ru.md new file mode 100644 index 00000000000000..1d84ba3e5b8fe5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_medruberttiny2_ru.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Russian sent_medruberttiny2 BertSentenceEmbeddings from DmitryPogrebnoy +author: John Snow Labs +name: sent_medruberttiny2 +date: 2024-09-25 +tags: [ru, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: ru +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_medruberttiny2` is a Russian model originally trained by DmitryPogrebnoy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_medruberttiny2_ru_5.5.0_3.0_1727248896378.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_medruberttiny2_ru_5.5.0_3.0_1727248896378.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_medruberttiny2","ru") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_medruberttiny2","ru") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_medruberttiny2| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|ru| +|Size:|109.1 MB| + +## References + +https://huggingface.co/DmitryPogrebnoy/MedRuBertTiny2 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_miem_scibert_linguistic_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_miem_scibert_linguistic_en.md new file mode 100644 index 00000000000000..f38947d93d9587 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_miem_scibert_linguistic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_miem_scibert_linguistic BertSentenceEmbeddings from miemBertProject +author: John Snow Labs +name: sent_miem_scibert_linguistic +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_miem_scibert_linguistic` is a English model originally trained by miemBertProject. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_miem_scibert_linguistic_en_5.5.0_3.0_1727249254687.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_miem_scibert_linguistic_en_5.5.0_3.0_1727249254687.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_miem_scibert_linguistic","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_miem_scibert_linguistic","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_miem_scibert_linguistic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|657.4 MB| + +## References + +https://huggingface.co/miemBertProject/miem-scibert-linguistic \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_minilmv2_l6_h384_distilled_from_bert_base_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_minilmv2_l6_h384_distilled_from_bert_base_pipeline_en.md new file mode 100644 index 00000000000000..609d5af33a5f92 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_minilmv2_l6_h384_distilled_from_bert_base_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_minilmv2_l6_h384_distilled_from_bert_base_pipeline pipeline BertSentenceEmbeddings from nreimers +author: John Snow Labs +name: sent_minilmv2_l6_h384_distilled_from_bert_base_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_minilmv2_l6_h384_distilled_from_bert_base_pipeline` is a English model originally trained by nreimers. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_minilmv2_l6_h384_distilled_from_bert_base_pipeline_en_5.5.0_3.0_1727230836299.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_minilmv2_l6_h384_distilled_from_bert_base_pipeline_en_5.5.0_3.0_1727230836299.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_minilmv2_l6_h384_distilled_from_bert_base_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_minilmv2_l6_h384_distilled_from_bert_base_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_minilmv2_l6_h384_distilled_from_bert_base_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|54.7 MB| + +## References + +https://huggingface.co/nreimers/MiniLMv2-L6-H384-distilled-from-BERT-Base + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_mitre_bert_small_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_mitre_bert_small_en.md new file mode 100644 index 00000000000000..366f761751cd5b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_mitre_bert_small_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_mitre_bert_small BertSentenceEmbeddings from bencyc1129 +author: John Snow Labs +name: sent_mitre_bert_small +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_mitre_bert_small` is a English model originally trained by bencyc1129. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_mitre_bert_small_en_5.5.0_3.0_1727234422468.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_mitre_bert_small_en_5.5.0_3.0_1727234422468.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_mitre_bert_small","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_mitre_bert_small","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_mitre_bert_small| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|108.5 MB| + +## References + +https://huggingface.co/bencyc1129/mitre-bert-small \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_mitre_bert_small_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_mitre_bert_small_pipeline_en.md new file mode 100644 index 00000000000000..859da057a86d43 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_mitre_bert_small_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_mitre_bert_small_pipeline pipeline BertSentenceEmbeddings from bencyc1129 +author: John Snow Labs +name: sent_mitre_bert_small_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_mitre_bert_small_pipeline` is a English model originally trained by bencyc1129. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_mitre_bert_small_pipeline_en_5.5.0_3.0_1727234427869.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_mitre_bert_small_pipeline_en_5.5.0_3.0_1727234427869.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_mitre_bert_small_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_mitre_bert_small_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_mitre_bert_small_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|109.1 MB| + +## References + +https://huggingface.co/bencyc1129/mitre-bert-small + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_nepal_bhasa_bert_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_nepal_bhasa_bert_en.md new file mode 100644 index 00000000000000..0f1623707e7dec --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_nepal_bhasa_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_nepal_bhasa_bert BertSentenceEmbeddings from searchfind +author: John Snow Labs +name: sent_nepal_bhasa_bert +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_nepal_bhasa_bert` is a English model originally trained by searchfind. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_nepal_bhasa_bert_en_5.5.0_3.0_1727253496987.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_nepal_bhasa_bert_en_5.5.0_3.0_1727253496987.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_nepal_bhasa_bert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_nepal_bhasa_bert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_nepal_bhasa_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|403.6 MB| + +## References + +https://huggingface.co/searchfind/New_BERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_nepal_bhasa_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_nepal_bhasa_bert_pipeline_en.md new file mode 100644 index 00000000000000..a3c5e7b1ccfd83 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_nepal_bhasa_bert_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_nepal_bhasa_bert_pipeline pipeline BertSentenceEmbeddings from searchfind +author: John Snow Labs +name: sent_nepal_bhasa_bert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_nepal_bhasa_bert_pipeline` is a English model originally trained by searchfind. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_nepal_bhasa_bert_pipeline_en_5.5.0_3.0_1727253519790.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_nepal_bhasa_bert_pipeline_en_5.5.0_3.0_1727253519790.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_nepal_bhasa_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_nepal_bhasa_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_nepal_bhasa_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.2 MB| + +## References + +https://huggingface.co/searchfind/New_BERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_norbert_no.md b/docs/_posts/ahmedlone127/2024-09-25-sent_norbert_no.md new file mode 100644 index 00000000000000..4549a29e9105b5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_norbert_no.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Norwegian sent_norbert BertSentenceEmbeddings from ltg +author: John Snow Labs +name: sent_norbert +date: 2024-09-25 +tags: ["no", open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: "no" +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_norbert` is a Norwegian model originally trained by ltg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_norbert_no_5.5.0_3.0_1727253356669.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_norbert_no_5.5.0_3.0_1727253356669.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_norbert","no") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_norbert","no") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_norbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|no| +|Size:|415.2 MB| + +## References + +https://huggingface.co/ltg/norbert \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_norbert_pipeline_no.md b/docs/_posts/ahmedlone127/2024-09-25-sent_norbert_pipeline_no.md new file mode 100644 index 00000000000000..b602290fecdd90 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_norbert_pipeline_no.md @@ -0,0 +1,71 @@ +--- +layout: model +title: Norwegian sent_norbert_pipeline pipeline BertSentenceEmbeddings from ltg +author: John Snow Labs +name: sent_norbert_pipeline +date: 2024-09-25 +tags: ["no", open_source, pipeline, onnx] +task: Embeddings +language: "no" +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_norbert_pipeline` is a Norwegian model originally trained by ltg. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_norbert_pipeline_no_5.5.0_3.0_1727253378130.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_norbert_pipeline_no_5.5.0_3.0_1727253378130.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_norbert_pipeline", lang = "no") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_norbert_pipeline", lang = "no") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_norbert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|no| +|Size:|415.7 MB| + +## References + +https://huggingface.co/ltg/norbert + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_prompt_finetune_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_prompt_finetune_en.md new file mode 100644 index 00000000000000..8e58c290ee8033 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_prompt_finetune_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_prompt_finetune BertSentenceEmbeddings from AndyJ +author: John Snow Labs +name: sent_prompt_finetune +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_prompt_finetune` is a English model originally trained by AndyJ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_prompt_finetune_en_5.5.0_3.0_1727234728465.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_prompt_finetune_en_5.5.0_3.0_1727234728465.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_prompt_finetune","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_prompt_finetune","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_prompt_finetune| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|408.0 MB| + +## References + +https://huggingface.co/AndyJ/prompt_finetune \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_prompt_finetune_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_prompt_finetune_pipeline_en.md new file mode 100644 index 00000000000000..a7f20c53e38842 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_prompt_finetune_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_prompt_finetune_pipeline pipeline BertSentenceEmbeddings from AndyJ +author: John Snow Labs +name: sent_prompt_finetune_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_prompt_finetune_pipeline` is a English model originally trained by AndyJ. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_prompt_finetune_pipeline_en_5.5.0_3.0_1727234749748.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_prompt_finetune_pipeline_en_5.5.0_3.0_1727234749748.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_prompt_finetune_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_prompt_finetune_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_prompt_finetune_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.6 MB| + +## References + +https://huggingface.co/AndyJ/prompt_finetune + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_protaugment_lm_clinic150_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_protaugment_lm_clinic150_en.md new file mode 100644 index 00000000000000..64f095398e66c7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_protaugment_lm_clinic150_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_protaugment_lm_clinic150 BertSentenceEmbeddings from tdopierre +author: John Snow Labs +name: sent_protaugment_lm_clinic150 +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_protaugment_lm_clinic150` is a English model originally trained by tdopierre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_protaugment_lm_clinic150_en_5.5.0_3.0_1727235126826.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_protaugment_lm_clinic150_en_5.5.0_3.0_1727235126826.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_protaugment_lm_clinic150","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_protaugment_lm_clinic150","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_protaugment_lm_clinic150| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|403.4 MB| + +## References + +https://huggingface.co/tdopierre/ProtAugment-LM-Clinic150 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_protaugment_lm_clinic150_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_protaugment_lm_clinic150_pipeline_en.md new file mode 100644 index 00000000000000..945a985e99a0f4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_protaugment_lm_clinic150_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_protaugment_lm_clinic150_pipeline pipeline BertSentenceEmbeddings from tdopierre +author: John Snow Labs +name: sent_protaugment_lm_clinic150_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_protaugment_lm_clinic150_pipeline` is a English model originally trained by tdopierre. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_protaugment_lm_clinic150_pipeline_en_5.5.0_3.0_1727235147029.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_protaugment_lm_clinic150_pipeline_en_5.5.0_3.0_1727235147029.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_protaugment_lm_clinic150_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_protaugment_lm_clinic150_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_protaugment_lm_clinic150_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|404.0 MB| + +## References + +https://huggingface.co/tdopierre/ProtAugment-LM-Clinic150 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_retromae_msmarco_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_retromae_msmarco_en.md new file mode 100644 index 00000000000000..e04448e320e66b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_retromae_msmarco_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_retromae_msmarco BertSentenceEmbeddings from Shitao +author: John Snow Labs +name: sent_retromae_msmarco +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_retromae_msmarco` is a English model originally trained by Shitao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_retromae_msmarco_en_5.5.0_3.0_1727230352092.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_retromae_msmarco_en_5.5.0_3.0_1727230352092.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_retromae_msmarco","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_retromae_msmarco","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_retromae_msmarco| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/Shitao/RetroMAE_MSMARCO \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_retromae_msmarco_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_retromae_msmarco_pipeline_en.md new file mode 100644 index 00000000000000..229997ba616906 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_retromae_msmarco_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_retromae_msmarco_pipeline pipeline BertSentenceEmbeddings from Shitao +author: John Snow Labs +name: sent_retromae_msmarco_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_retromae_msmarco_pipeline` is a English model originally trained by Shitao. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_retromae_msmarco_pipeline_en_5.5.0_3.0_1727230372703.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_retromae_msmarco_pipeline_en_5.5.0_3.0_1727230372703.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_retromae_msmarco_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_retromae_msmarco_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_retromae_msmarco_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|408.2 MB| + +## References + +https://huggingface.co/Shitao/RetroMAE_MSMARCO + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_scholarbert_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_scholarbert_en.md new file mode 100644 index 00000000000000..08afc7e3ba1e64 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_scholarbert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_scholarbert BertSentenceEmbeddings from globuslabs +author: John Snow Labs +name: sent_scholarbert +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_scholarbert` is a English model originally trained by globuslabs. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_scholarbert_en_5.5.0_3.0_1727253411352.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_scholarbert_en_5.5.0_3.0_1727253411352.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_scholarbert","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_scholarbert","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_scholarbert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/globuslabs/ScholarBERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_viz_wiz_bert_base_uncased_f32_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_viz_wiz_bert_base_uncased_f32_en.md new file mode 100644 index 00000000000000..4942df83b4b9d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_viz_wiz_bert_base_uncased_f32_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sent_viz_wiz_bert_base_uncased_f32 BertSentenceEmbeddings from eisenjulian +author: John Snow Labs +name: sent_viz_wiz_bert_base_uncased_f32 +date: 2024-09-25 +tags: [en, open_source, onnx, sentence_embeddings, bert] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertSentenceEmbeddings +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_viz_wiz_bert_base_uncased_f32` is a English model originally trained by eisenjulian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_viz_wiz_bert_base_uncased_f32_en_5.5.0_3.0_1727234629727.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_viz_wiz_bert_base_uncased_f32_en_5.5.0_3.0_1727234629727.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") \ + .setInputCols(["document"]) \ + .setOutputCol("sentence") + +embeddings = BertSentenceEmbeddings.pretrained("sent_viz_wiz_bert_base_uncased_f32","en") \ + .setInputCols(["sentence"]) \ + .setOutputCol("embeddings") + +pipeline = Pipeline().setStages([documentAssembler, sentenceDL, embeddings]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val sentenceDL = SentenceDetectorDLModel.pretrained("sentence_detector_dl", "xx") + .setInputCols(Array("document")) + .setOutputCol("sentence") + +val embeddings = BertSentenceEmbeddings.pretrained("sent_viz_wiz_bert_base_uncased_f32","en") + .setInputCols(Array("sentence")) + .setOutputCol("embeddings") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, sentenceDL, embeddings)) +val data = Seq("I love spark-nlp").toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_viz_wiz_bert_base_uncased_f32| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[sentence]| +|Output Labels:|[embeddings]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/eisenjulian/viz-wiz-bert-base-uncased_f32 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sent_viz_wiz_bert_base_uncased_f32_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sent_viz_wiz_bert_base_uncased_f32_pipeline_en.md new file mode 100644 index 00000000000000..465446701401cb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sent_viz_wiz_bert_base_uncased_f32_pipeline_en.md @@ -0,0 +1,71 @@ +--- +layout: model +title: English sent_viz_wiz_bert_base_uncased_f32_pipeline pipeline BertSentenceEmbeddings from eisenjulian +author: John Snow Labs +name: sent_viz_wiz_bert_base_uncased_f32_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Embeddings +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertSentenceEmbeddings, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sent_viz_wiz_bert_base_uncased_f32_pipeline` is a English model originally trained by eisenjulian. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sent_viz_wiz_bert_base_uncased_f32_pipeline_en_5.5.0_3.0_1727234650812.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sent_viz_wiz_bert_base_uncased_f32_pipeline_en_5.5.0_3.0_1727234650812.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sent_viz_wiz_bert_base_uncased_f32_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sent_viz_wiz_bert_base_uncased_f32_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sent_viz_wiz_bert_base_uncased_f32_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|407.7 MB| + +## References + +https://huggingface.co/eisenjulian/viz-wiz-bert-base-uncased_f32 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- SentenceDetectorDLModel +- BertSentenceEmbeddings \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sentiment_ita_en.md b/docs/_posts/ahmedlone127/2024-09-25-sentiment_ita_en.md new file mode 100644 index 00000000000000..e01fbd133b19d5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sentiment_ita_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English sentiment_ita BertForSequenceClassification from luigisaetta +author: John Snow Labs +name: sentiment_ita +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_ita` is a English model originally trained by luigisaetta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_ita_en_5.5.0_3.0_1727222587188.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_ita_en_5.5.0_3.0_1727222587188.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("sentiment_ita","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("sentiment_ita", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_ita| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|414.8 MB| + +## References + +https://huggingface.co/luigisaetta/sentiment_ita \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sentiment_ita_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sentiment_ita_pipeline_en.md new file mode 100644 index 00000000000000..2cad2cb329804d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sentiment_ita_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sentiment_ita_pipeline pipeline BertForSequenceClassification from luigisaetta +author: John Snow Labs +name: sentiment_ita_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sentiment_ita_pipeline` is a English model originally trained by luigisaetta. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sentiment_ita_pipeline_en_5.5.0_3.0_1727222608612.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sentiment_ita_pipeline_en_5.5.0_3.0_1727222608612.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sentiment_ita_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sentiment_ita_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sentiment_ita_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.9 MB| + +## References + +https://huggingface.co/luigisaetta/sentiment_ita + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-shulchan_aruch_classifier_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-shulchan_aruch_classifier_pipeline_en.md new file mode 100644 index 00000000000000..623334a386b08c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-shulchan_aruch_classifier_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English shulchan_aruch_classifier_pipeline pipeline BertForSequenceClassification from sivan22 +author: John Snow Labs +name: shulchan_aruch_classifier_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`shulchan_aruch_classifier_pipeline` is a English model originally trained by sivan22. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/shulchan_aruch_classifier_pipeline_en_5.5.0_3.0_1727273346179.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/shulchan_aruch_classifier_pipeline_en_5.5.0_3.0_1727273346179.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("shulchan_aruch_classifier_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("shulchan_aruch_classifier_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|shulchan_aruch_classifier_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|692.2 MB| + +## References + +https://huggingface.co/sivan22/shulchan-aruch-classifier + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-snli_test_100k_en.md b/docs/_posts/ahmedlone127/2024-09-25-snli_test_100k_en.md new file mode 100644 index 00000000000000..5aa9d0263afbc6 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-snli_test_100k_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English snli_test_100k BertForSequenceClassification from grace-pro +author: John Snow Labs +name: snli_test_100k +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`snli_test_100k` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/snli_test_100k_en_5.5.0_3.0_1727278442647.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/snli_test_100k_en_5.5.0_3.0_1727278442647.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("snli_test_100k","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("snli_test_100k", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|snli_test_100k| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/grace-pro/snli_test_100k \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-snli_test_100k_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-snli_test_100k_pipeline_en.md new file mode 100644 index 00000000000000..fc386b85fbc191 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-snli_test_100k_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English snli_test_100k_pipeline pipeline BertForSequenceClassification from grace-pro +author: John Snow Labs +name: snli_test_100k_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`snli_test_100k_pipeline` is a English model originally trained by grace-pro. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/snli_test_100k_pipeline_en_5.5.0_3.0_1727278464665.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/snli_test_100k_pipeline_en_5.5.0_3.0_1727278464665.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("snli_test_100k_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("snli_test_100k_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|snli_test_100k_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/grace-pro/snli_test_100k + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-star_predictor_en.md b/docs/_posts/ahmedlone127/2024-09-25-star_predictor_en.md new file mode 100644 index 00000000000000..b67a2019095d35 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-star_predictor_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English star_predictor BertForSequenceClassification from Yanni8 +author: John Snow Labs +name: star_predictor +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`star_predictor` is a English model originally trained by Yanni8. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/star_predictor_en_5.5.0_3.0_1727268038749.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/star_predictor_en_5.5.0_3.0_1727268038749.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("star_predictor","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("star_predictor", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|star_predictor| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Yanni8/star-predictor \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-stereoset_bert_base_uncased_classifieronly_en.md b/docs/_posts/ahmedlone127/2024-09-25-stereoset_bert_base_uncased_classifieronly_en.md new file mode 100644 index 00000000000000..2ffc06ce131289 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-stereoset_bert_base_uncased_classifieronly_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English stereoset_bert_base_uncased_classifieronly BertForSequenceClassification from henryscheible +author: John Snow Labs +name: stereoset_bert_base_uncased_classifieronly +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stereoset_bert_base_uncased_classifieronly` is a English model originally trained by henryscheible. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stereoset_bert_base_uncased_classifieronly_en_5.5.0_3.0_1727286038183.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stereoset_bert_base_uncased_classifieronly_en_5.5.0_3.0_1727286038183.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("stereoset_bert_base_uncased_classifieronly","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("stereoset_bert_base_uncased_classifieronly", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stereoset_bert_base_uncased_classifieronly| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/henryscheible/stereoset_bert-base-uncased_classifieronly \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-stsb_vn_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-stsb_vn_pipeline_en.md new file mode 100644 index 00000000000000..09fb502aac397d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-stsb_vn_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English stsb_vn_pipeline pipeline BertForSequenceClassification from ntrnghia +author: John Snow Labs +name: stsb_vn_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`stsb_vn_pipeline` is a English model originally trained by ntrnghia. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/stsb_vn_pipeline_en_5.5.0_3.0_1727266098699.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/stsb_vn_pipeline_en_5.5.0_3.0_1727266098699.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("stsb_vn_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("stsb_vn_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|stsb_vn_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|667.3 MB| + +## References + +https://huggingface.co/ntrnghia/stsb_vn + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-sustainable_finance_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-sustainable_finance_bert_pipeline_en.md new file mode 100644 index 00000000000000..16d0b1cbdbc427 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-sustainable_finance_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English sustainable_finance_bert_pipeline pipeline BertForSequenceClassification from Pelumioluwa +author: John Snow Labs +name: sustainable_finance_bert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`sustainable_finance_bert_pipeline` is a English model originally trained by Pelumioluwa. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/sustainable_finance_bert_pipeline_en_5.5.0_3.0_1727262007229.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/sustainable_finance_bert_pipeline_en_5.5.0_3.0_1727262007229.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("sustainable_finance_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("sustainable_finance_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|sustainable_finance_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Pelumioluwa/Sustainable-Finance-BERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_pipeline_en.md new file mode 100644 index 00000000000000..2f6781c1a9d3a8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_pipeline pipeline BertForSequenceClassification from ajtamayoh +author: John Snow Labs +name: symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_pipeline` is a English model originally trained by ajtamayoh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_pipeline_en_5.5.0_3.0_1727261105110.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_pipeline_en_5.5.0_3.0_1727261105110.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|symptoms_tonga_tonga_islands_diagnosis_sonatafyai_bert_v1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/ajtamayoh/Symptoms_to_Diagnosis_SonatafyAI_BERT_v1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-t_frex_bert_large_uncased_en.md b/docs/_posts/ahmedlone127/2024-09-25-t_frex_bert_large_uncased_en.md new file mode 100644 index 00000000000000..3ac4d20ef2381f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-t_frex_bert_large_uncased_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English t_frex_bert_large_uncased BertForTokenClassification from quim-motger +author: John Snow Labs +name: t_frex_bert_large_uncased +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`t_frex_bert_large_uncased` is a English model originally trained by quim-motger. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/t_frex_bert_large_uncased_en_5.5.0_3.0_1727271769440.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/t_frex_bert_large_uncased_en_5.5.0_3.0_1727271769440.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("t_frex_bert_large_uncased","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("t_frex_bert_large_uncased", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|t_frex_bert_large_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/quim-motger/t-frex-bert-large-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-teknofest_nlp_finetuned_tddi_en.md b/docs/_posts/ahmedlone127/2024-09-25-teknofest_nlp_finetuned_tddi_en.md new file mode 100644 index 00000000000000..c32dbafc208f94 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-teknofest_nlp_finetuned_tddi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English teknofest_nlp_finetuned_tddi BertForSequenceClassification from OnurSahh +author: John Snow Labs +name: teknofest_nlp_finetuned_tddi +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`teknofest_nlp_finetuned_tddi` is a English model originally trained by OnurSahh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/teknofest_nlp_finetuned_tddi_en_5.5.0_3.0_1727263454258.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/teknofest_nlp_finetuned_tddi_en_5.5.0_3.0_1727263454258.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("teknofest_nlp_finetuned_tddi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("teknofest_nlp_finetuned_tddi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|teknofest_nlp_finetuned_tddi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|414.5 MB| + +## References + +https://huggingface.co/OnurSahh/teknofest_nlp_finetuned_tddi \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-teknofest_nlp_finetuned_tddi_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-teknofest_nlp_finetuned_tddi_pipeline_en.md new file mode 100644 index 00000000000000..d551da14abe8dc --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-teknofest_nlp_finetuned_tddi_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English teknofest_nlp_finetuned_tddi_pipeline pipeline BertForSequenceClassification from OnurSahh +author: John Snow Labs +name: teknofest_nlp_finetuned_tddi_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`teknofest_nlp_finetuned_tddi_pipeline` is a English model originally trained by OnurSahh. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/teknofest_nlp_finetuned_tddi_pipeline_en_5.5.0_3.0_1727263478896.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/teknofest_nlp_finetuned_tddi_pipeline_en_5.5.0_3.0_1727263478896.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("teknofest_nlp_finetuned_tddi_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("teknofest_nlp_finetuned_tddi_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|teknofest_nlp_finetuned_tddi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.6 MB| + +## References + +https://huggingface.co/OnurSahh/teknofest_nlp_finetuned_tddi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-tempclin_biobertpt_clin_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-25-tempclin_biobertpt_clin_pipeline_pt.md new file mode 100644 index 00000000000000..57abc1c5f046d7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-tempclin_biobertpt_clin_pipeline_pt.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Portuguese tempclin_biobertpt_clin_pipeline pipeline BertForTokenClassification from pucpr-br +author: John Snow Labs +name: tempclin_biobertpt_clin_pipeline +date: 2024-09-25 +tags: [pt, open_source, pipeline, onnx] +task: Named Entity Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tempclin_biobertpt_clin_pipeline` is a Portuguese model originally trained by pucpr-br. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tempclin_biobertpt_clin_pipeline_pt_5.5.0_3.0_1727271160027.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tempclin_biobertpt_clin_pipeline_pt_5.5.0_3.0_1727271160027.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tempclin_biobertpt_clin_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tempclin_biobertpt_clin_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tempclin_biobertpt_clin_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|665.1 MB| + +## References + +https://huggingface.co/pucpr-br/tempclin-biobertpt-clin + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-test_hub_push_en.md b/docs/_posts/ahmedlone127/2024-09-25-test_hub_push_en.md new file mode 100644 index 00000000000000..c5bdc6fe939a1f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-test_hub_push_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_hub_push BertForSequenceClassification from Tonita +author: John Snow Labs +name: test_hub_push +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_hub_push` is a English model originally trained by Tonita. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_hub_push_en_5.5.0_3.0_1727287918238.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_hub_push_en_5.5.0_3.0_1727287918238.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("test_hub_push","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("test_hub_push", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_hub_push| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Tonita/test-hub-push \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-test_ner_rundi_en.md b/docs/_posts/ahmedlone127/2024-09-25-test_ner_rundi_en.md new file mode 100644 index 00000000000000..8194cd2dfb7c0d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-test_ner_rundi_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_ner_rundi BertForTokenClassification from lltala +author: John Snow Labs +name: test_ner_rundi +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_ner_rundi` is a English model originally trained by lltala. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_ner_rundi_en_5.5.0_3.0_1727283624953.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_ner_rundi_en_5.5.0_3.0_1727283624953.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("test_ner_rundi","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("test_ner_rundi", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_ner_rundi| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|407.2 MB| + +## References + +https://huggingface.co/lltala/test-ner-run \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-test_trainer_gaito_20_en.md b/docs/_posts/ahmedlone127/2024-09-25-test_trainer_gaito_20_en.md new file mode 100644 index 00000000000000..53b6148a3b323e --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-test_trainer_gaito_20_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English test_trainer_gaito_20 BertForSequenceClassification from gaito-20 +author: John Snow Labs +name: test_trainer_gaito_20 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_trainer_gaito_20` is a English model originally trained by gaito-20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_trainer_gaito_20_en_5.5.0_3.0_1727269856481.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_trainer_gaito_20_en_5.5.0_3.0_1727269856481.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("test_trainer_gaito_20","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("test_trainer_gaito_20", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_trainer_gaito_20| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/gaito-20/test-trainer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-test_trainer_gaito_20_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-test_trainer_gaito_20_pipeline_en.md new file mode 100644 index 00000000000000..5527c02fcc2f8c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-test_trainer_gaito_20_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English test_trainer_gaito_20_pipeline pipeline BertForSequenceClassification from gaito-20 +author: John Snow Labs +name: test_trainer_gaito_20_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`test_trainer_gaito_20_pipeline` is a English model originally trained by gaito-20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/test_trainer_gaito_20_pipeline_en_5.5.0_3.0_1727269878752.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/test_trainer_gaito_20_pipeline_en_5.5.0_3.0_1727269878752.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("test_trainer_gaito_20_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("test_trainer_gaito_20_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|test_trainer_gaito_20_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/gaito-20/test-trainer + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-tos_bert_en.md b/docs/_posts/ahmedlone127/2024-09-25-tos_bert_en.md new file mode 100644 index 00000000000000..8c44091d460729 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-tos_bert_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tos_bert BertForSequenceClassification from prasannadhungana8848 +author: John Snow Labs +name: tos_bert +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tos_bert` is a English model originally trained by prasannadhungana8848. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tos_bert_en_5.5.0_3.0_1727257634525.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tos_bert_en_5.5.0_3.0_1727257634525.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("tos_bert","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("tos_bert", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tos_bert| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/prasannadhungana8848/TOS_BERT \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-toxicity_type_detection_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-25-toxicity_type_detection_pipeline_pt.md new file mode 100644 index 00000000000000..4b5a1981a1b04d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-toxicity_type_detection_pipeline_pt.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Portuguese toxicity_type_detection_pipeline pipeline BertForSequenceClassification from dougtrajano +author: John Snow Labs +name: toxicity_type_detection_pipeline +date: 2024-09-25 +tags: [pt, open_source, pipeline, onnx] +task: Text Classification +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`toxicity_type_detection_pipeline` is a Portuguese model originally trained by dougtrajano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/toxicity_type_detection_pipeline_pt_5.5.0_3.0_1727265790660.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/toxicity_type_detection_pipeline_pt_5.5.0_3.0_1727265790660.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("toxicity_type_detection_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("toxicity_type_detection_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|toxicity_type_detection_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|408.2 MB| + +## References + +https://huggingface.co/dougtrajano/toxicity-type-detection + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-toxicity_type_detection_pt.md b/docs/_posts/ahmedlone127/2024-09-25-toxicity_type_detection_pt.md new file mode 100644 index 00000000000000..f9a3cc1d38edf7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-toxicity_type_detection_pt.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Portuguese toxicity_type_detection BertForSequenceClassification from dougtrajano +author: John Snow Labs +name: toxicity_type_detection +date: 2024-09-25 +tags: [pt, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`toxicity_type_detection` is a Portuguese model originally trained by dougtrajano. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/toxicity_type_detection_pt_5.5.0_3.0_1727265768511.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/toxicity_type_detection_pt_5.5.0_3.0_1727265768511.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("toxicity_type_detection","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("toxicity_type_detection", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|toxicity_type_detection| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|pt| +|Size:|408.2 MB| + +## References + +https://huggingface.co/dougtrajano/toxicity-type-detection \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-trac2020_iben_a_bert_base_multilingual_uncased_pipeline_xx.md b/docs/_posts/ahmedlone127/2024-09-25-trac2020_iben_a_bert_base_multilingual_uncased_pipeline_xx.md new file mode 100644 index 00000000000000..217788124d4f28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-trac2020_iben_a_bert_base_multilingual_uncased_pipeline_xx.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Multilingual trac2020_iben_a_bert_base_multilingual_uncased_pipeline pipeline BertForSequenceClassification from socialmediaie +author: John Snow Labs +name: trac2020_iben_a_bert_base_multilingual_uncased_pipeline +date: 2024-09-25 +tags: [xx, open_source, pipeline, onnx] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trac2020_iben_a_bert_base_multilingual_uncased_pipeline` is a Multilingual model originally trained by socialmediaie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trac2020_iben_a_bert_base_multilingual_uncased_pipeline_xx_5.5.0_3.0_1727257138597.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trac2020_iben_a_bert_base_multilingual_uncased_pipeline_xx_5.5.0_3.0_1727257138597.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("trac2020_iben_a_bert_base_multilingual_uncased_pipeline", lang = "xx") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("trac2020_iben_a_bert_base_multilingual_uncased_pipeline", lang = "xx") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trac2020_iben_a_bert_base_multilingual_uncased_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|xx| +|Size:|627.8 MB| + +## References + +https://huggingface.co/socialmediaie/TRAC2020_IBEN_A_bert-base-multilingual-uncased + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-trac2020_iben_a_bert_base_multilingual_uncased_xx.md b/docs/_posts/ahmedlone127/2024-09-25-trac2020_iben_a_bert_base_multilingual_uncased_xx.md new file mode 100644 index 00000000000000..fc62152f2c8a68 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-trac2020_iben_a_bert_base_multilingual_uncased_xx.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Multilingual trac2020_iben_a_bert_base_multilingual_uncased BertForSequenceClassification from socialmediaie +author: John Snow Labs +name: trac2020_iben_a_bert_base_multilingual_uncased +date: 2024-09-25 +tags: [xx, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: xx +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`trac2020_iben_a_bert_base_multilingual_uncased` is a Multilingual model originally trained by socialmediaie. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/trac2020_iben_a_bert_base_multilingual_uncased_xx_5.5.0_3.0_1727257105132.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/trac2020_iben_a_bert_base_multilingual_uncased_xx_5.5.0_3.0_1727257105132.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("trac2020_iben_a_bert_base_multilingual_uncased","xx") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("trac2020_iben_a_bert_base_multilingual_uncased", "xx") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|trac2020_iben_a_bert_base_multilingual_uncased| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|xx| +|Size:|627.7 MB| + +## References + +https://huggingface.co/socialmediaie/TRAC2020_IBEN_A_bert-base-multilingual-uncased \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ttpxhunter_en.md b/docs/_posts/ahmedlone127/2024-09-25-ttpxhunter_en.md new file mode 100644 index 00000000000000..f3cff68eb1cf50 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ttpxhunter_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English ttpxhunter RoBertaForSequenceClassification from nanda-rani +author: John Snow Labs +name: ttpxhunter +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: RoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ttpxhunter` is a English model originally trained by nanda-rani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ttpxhunter_en_5.5.0_3.0_1727234001086.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ttpxhunter_en_5.5.0_3.0_1727234001086.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = RoBertaForSequenceClassification.pretrained("ttpxhunter","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = RoBertaForSequenceClassification.pretrained("ttpxhunter", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ttpxhunter| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|468.9 MB| + +## References + +https://huggingface.co/nanda-rani/TTPXHunter \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-ttpxhunter_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-ttpxhunter_pipeline_en.md new file mode 100644 index 00000000000000..c659eb13dca69c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-ttpxhunter_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English ttpxhunter_pipeline pipeline RoBertaForSequenceClassification from nanda-rani +author: John Snow Labs +name: ttpxhunter_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained RoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`ttpxhunter_pipeline` is a English model originally trained by nanda-rani. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/ttpxhunter_pipeline_en_5.5.0_3.0_1727234024886.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/ttpxhunter_pipeline_en_5.5.0_3.0_1727234024886.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("ttpxhunter_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("ttpxhunter_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|ttpxhunter_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|468.9 MB| + +## References + +https://huggingface.co/nanda-rani/TTPXHunter + +## Included Models + +- DocumentAssembler +- TokenizerModel +- RoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-tupi_bert_large_portuguese_cased_multiclass_multilabel_en.md b/docs/_posts/ahmedlone127/2024-09-25-tupi_bert_large_portuguese_cased_multiclass_multilabel_en.md new file mode 100644 index 00000000000000..3de803ba18fa28 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-tupi_bert_large_portuguese_cased_multiclass_multilabel_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English tupi_bert_large_portuguese_cased_multiclass_multilabel BertForSequenceClassification from FpOliveira +author: John Snow Labs +name: tupi_bert_large_portuguese_cased_multiclass_multilabel +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tupi_bert_large_portuguese_cased_multiclass_multilabel` is a English model originally trained by FpOliveira. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tupi_bert_large_portuguese_cased_multiclass_multilabel_en_5.5.0_3.0_1727242185903.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tupi_bert_large_portuguese_cased_multiclass_multilabel_en_5.5.0_3.0_1727242185903.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("tupi_bert_large_portuguese_cased_multiclass_multilabel","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("tupi_bert_large_portuguese_cased_multiclass_multilabel", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tupi_bert_large_portuguese_cased_multiclass_multilabel| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/FpOliveira/tupi-bert-large-portuguese-cased-multiclass-multilabel \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-tupi_bert_large_portuguese_cased_multiclass_multilabel_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-tupi_bert_large_portuguese_cased_multiclass_multilabel_pipeline_en.md new file mode 100644 index 00000000000000..40083cd3ddfb95 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-tupi_bert_large_portuguese_cased_multiclass_multilabel_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English tupi_bert_large_portuguese_cased_multiclass_multilabel_pipeline pipeline BertForSequenceClassification from FpOliveira +author: John Snow Labs +name: tupi_bert_large_portuguese_cased_multiclass_multilabel_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tupi_bert_large_portuguese_cased_multiclass_multilabel_pipeline` is a English model originally trained by FpOliveira. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tupi_bert_large_portuguese_cased_multiclass_multilabel_pipeline_en_5.5.0_3.0_1727242251028.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tupi_bert_large_portuguese_cased_multiclass_multilabel_pipeline_en_5.5.0_3.0_1727242251028.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("tupi_bert_large_portuguese_cased_multiclass_multilabel_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("tupi_bert_large_portuguese_cased_multiclass_multilabel_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tupi_bert_large_portuguese_cased_multiclass_multilabel_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.3 GB| + +## References + +https://huggingface.co/FpOliveira/tupi-bert-large-portuguese-cased-multiclass-multilabel + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-tupy_bert_base_binary_classifier_pt.md b/docs/_posts/ahmedlone127/2024-09-25-tupy_bert_base_binary_classifier_pt.md new file mode 100644 index 00000000000000..df3e61f6d3df3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-tupy_bert_base_binary_classifier_pt.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Portuguese tupy_bert_base_binary_classifier BertForSequenceClassification from Silly-Machine +author: John Snow Labs +name: tupy_bert_base_binary_classifier +date: 2024-09-25 +tags: [pt, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`tupy_bert_base_binary_classifier` is a Portuguese model originally trained by Silly-Machine. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/tupy_bert_base_binary_classifier_pt_5.5.0_3.0_1727268815687.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/tupy_bert_base_binary_classifier_pt_5.5.0_3.0_1727268815687.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("tupy_bert_base_binary_classifier","pt") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("tupy_bert_base_binary_classifier", "pt") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|tupy_bert_base_binary_classifier| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|pt| +|Size:|408.2 MB| + +## References + +https://huggingface.co/Silly-Machine/TuPy-Bert-Base-Binary-Classifier \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-turkish_earthquake_tweets_ner_en.md b/docs/_posts/ahmedlone127/2024-09-25-turkish_earthquake_tweets_ner_en.md new file mode 100644 index 00000000000000..5c947a91f14a57 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-turkish_earthquake_tweets_ner_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English turkish_earthquake_tweets_ner BertForTokenClassification from yhaslan +author: John Snow Labs +name: turkish_earthquake_tweets_ner +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`turkish_earthquake_tweets_ner` is a English model originally trained by yhaslan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/turkish_earthquake_tweets_ner_en_5.5.0_3.0_1727249736193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/turkish_earthquake_tweets_ner_en_5.5.0_3.0_1727249736193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("turkish_earthquake_tweets_ner","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("turkish_earthquake_tweets_ner", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|turkish_earthquake_tweets_ner| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|412.3 MB| + +## References + +https://huggingface.co/yhaslan/turkish-earthquake-tweets-ner \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-turkish_earthquake_tweets_ner_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-turkish_earthquake_tweets_ner_pipeline_en.md new file mode 100644 index 00000000000000..c5b885c15e02aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-turkish_earthquake_tweets_ner_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English turkish_earthquake_tweets_ner_pipeline pipeline BertForTokenClassification from yhaslan +author: John Snow Labs +name: turkish_earthquake_tweets_ner_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`turkish_earthquake_tweets_ner_pipeline` is a English model originally trained by yhaslan. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/turkish_earthquake_tweets_ner_pipeline_en_5.5.0_3.0_1727249758142.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/turkish_earthquake_tweets_ner_pipeline_en_5.5.0_3.0_1727249758142.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("turkish_earthquake_tweets_ner_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("turkish_earthquake_tweets_ner_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|turkish_earthquake_tweets_ner_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|412.3 MB| + +## References + +https://huggingface.co/yhaslan/turkish-earthquake-tweets-ner + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForTokenClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-turkish_tiny_bert_uncased_offenseval2020_turkish_tr.md b/docs/_posts/ahmedlone127/2024-09-25-turkish_tiny_bert_uncased_offenseval2020_turkish_tr.md new file mode 100644 index 00000000000000..42ca2361569e67 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-turkish_tiny_bert_uncased_offenseval2020_turkish_tr.md @@ -0,0 +1,94 @@ +--- +layout: model +title: Turkish turkish_tiny_bert_uncased_offenseval2020_turkish BertForSequenceClassification from atasoglu +author: John Snow Labs +name: turkish_tiny_bert_uncased_offenseval2020_turkish +date: 2024-09-25 +tags: [tr, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: tr +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`turkish_tiny_bert_uncased_offenseval2020_turkish` is a Turkish model originally trained by atasoglu. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/turkish_tiny_bert_uncased_offenseval2020_turkish_tr_5.5.0_3.0_1727287835043.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/turkish_tiny_bert_uncased_offenseval2020_turkish_tr_5.5.0_3.0_1727287835043.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("turkish_tiny_bert_uncased_offenseval2020_turkish","tr") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("turkish_tiny_bert_uncased_offenseval2020_turkish", "tr") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|turkish_tiny_bert_uncased_offenseval2020_turkish| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|tr| +|Size:|17.5 MB| + +## References + +https://huggingface.co/atasoglu/turkish-tiny-bert-uncased-offenseval2020_tr \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-turkishnewsanalysis_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-turkishnewsanalysis_pipeline_en.md new file mode 100644 index 00000000000000..efd5f7ce5030aa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-turkishnewsanalysis_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English turkishnewsanalysis_pipeline pipeline BertForSequenceClassification from MesutAktas +author: John Snow Labs +name: turkishnewsanalysis_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`turkishnewsanalysis_pipeline` is a English model originally trained by MesutAktas. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/turkishnewsanalysis_pipeline_en_5.5.0_3.0_1727254244524.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/turkishnewsanalysis_pipeline_en_5.5.0_3.0_1727254244524.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("turkishnewsanalysis_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("turkishnewsanalysis_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|turkishnewsanalysis_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.6 MB| + +## References + +https://huggingface.co/MesutAktas/TurkishNewsAnalysis + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-twitter_empathy_miles_en.md b/docs/_posts/ahmedlone127/2024-09-25-twitter_empathy_miles_en.md new file mode 100644 index 00000000000000..46d13a1cecb5b8 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-twitter_empathy_miles_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English twitter_empathy_miles BertForSequenceClassification from rxsong +author: John Snow Labs +name: twitter_empathy_miles +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_empathy_miles` is a English model originally trained by rxsong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_empathy_miles_en_5.5.0_3.0_1727237576259.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_empathy_miles_en_5.5.0_3.0_1727237576259.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("twitter_empathy_miles","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("twitter_empathy_miles", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_empathy_miles| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/rxsong/twitter_empathy_Miles \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-twitter_empathy_miles_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-twitter_empathy_miles_pipeline_en.md new file mode 100644 index 00000000000000..c6e379ca9d39b2 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-twitter_empathy_miles_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitter_empathy_miles_pipeline pipeline BertForSequenceClassification from rxsong +author: John Snow Labs +name: twitter_empathy_miles_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_empathy_miles_pipeline` is a English model originally trained by rxsong. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_empathy_miles_pipeline_en_5.5.0_3.0_1727237597262.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_empathy_miles_pipeline_en_5.5.0_3.0_1727237597262.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitter_empathy_miles_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitter_empathy_miles_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_empathy_miles_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|405.9 MB| + +## References + +https://huggingface.co/rxsong/twitter_empathy_Miles + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-twitter_sentiment_analysis_en.md b/docs/_posts/ahmedlone127/2024-09-25-twitter_sentiment_analysis_en.md new file mode 100644 index 00000000000000..644a3b5ed86236 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-twitter_sentiment_analysis_en.md @@ -0,0 +1,98 @@ +--- +layout: model +title: English twitter_sentiment_analysis DistilBertForSequenceClassification from vickylin21 +author: John Snow Labs +name: twitter_sentiment_analysis +date: 2024-09-25 +tags: [bert, en, open_source, sequence_classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained DistilBertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_sentiment_analysis` is a English model originally trained by vickylin21. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_sentiment_analysis_en_5.5.0_3.0_1727277654806.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_sentiment_analysis_en_5.5.0_3.0_1727277654806.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +document_assembler = DocumentAssembler()\ + .setInputCol("text")\ + .setOutputCol("document") + +tokenizer = Tokenizer()\ + .setInputCols("document")\ + .setOutputCol("token") + +sequenceClassifier = DistilBertForSequenceClassification.pretrained("twitter_sentiment_analysis","en")\ + .setInputCols(["document","token"])\ + .setOutputCol("class") + +pipeline = Pipeline().setStages([document_assembler, tokenizer, sequenceClassifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val document_assembler = new DocumentAssembler() + .setInputCol("text") + .setOutputCol("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val sequenceClassifier = DistilBertForSequenceClassification.pretrained("twitter_sentiment_analysis","en") + .setInputCols(Array("document","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_sentiment_analysis| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.4 MB| + +## References + +References + +https://huggingface.co/vickylin21/Twitter_sentiment_analysis \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-twitter_xlm_roberta_base_sentiment_en.md b/docs/_posts/ahmedlone127/2024-09-25-twitter_xlm_roberta_base_sentiment_en.md new file mode 100644 index 00000000000000..aee1f003300787 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-twitter_xlm_roberta_base_sentiment_en.md @@ -0,0 +1,88 @@ +--- +layout: model +title: twitter_xlm_roberta_base_sentiment(Cardiff nlp) (Veer) +author: John Snow Labs +name: twitter_xlm_roberta_base_sentiment +date: 2024-09-25 +tags: [en, open_source, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +This is a multilingual XLM-roBERTa-base model trained on ~198M tweets and finetuned for sentiment analysis. The sentiment fine-tuning was done on 8 languages (Ar, En, Fr, De, Hi, It, Sp, Pt) but it can be used for more languages (see paper for details). + +Paper: XLM-T: A Multilingual Language Model Toolkit for Twitter. +Git Repo: XLM-T official repository. +This model has been integrated into the TweetNLP library. + +## Predicted Entities + + + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_xlm_roberta_base_sentiment_en_5.5.0_3.0_1727229581766.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_xlm_roberta_base_sentiment_en_5.5.0_3.0_1727229581766.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +from pyspark.ml import Pipeline + +document_assembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained('twitter_xlm_roberta_base_sentiment')\ + .setInputCols(["document",'token'])\ + .setOutputCol("class") + +pipeline = Pipeline(stages=[ + document_assembler, + tokenizer, + sequenceClassifier +]) + +# couple of simple examples +example = spark.createDataFrame([['사랑해!'], ["T'estimo! ❤️"], ["I love you!"], ['Mahal kita!']]).toDF("text") + +result = pipeline.fit(example).transform(example) + +# result is a DataFrame +result.select("text", "class.result").show() +``` + +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_xlm_roberta_base_sentiment| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|1.0 GB| \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-twitter_xlm_roberta_base_sentiment_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-twitter_xlm_roberta_base_sentiment_pipeline_en.md new file mode 100644 index 00000000000000..79a96647e2c87d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-twitter_xlm_roberta_base_sentiment_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English twitter_xlm_roberta_base_sentiment_pipeline pipeline XlmRoBertaForSequenceClassification from cardiffnlp +author: John Snow Labs +name: twitter_xlm_roberta_base_sentiment_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`twitter_xlm_roberta_base_sentiment_pipeline` is a English model originally trained by cardiffnlp. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/twitter_xlm_roberta_base_sentiment_pipeline_en_5.5.0_3.0_1727229633590.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/twitter_xlm_roberta_base_sentiment_pipeline_en_5.5.0_3.0_1727229633590.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("twitter_xlm_roberta_base_sentiment_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("twitter_xlm_roberta_base_sentiment_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|twitter_xlm_roberta_base_sentiment_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.0 GB| + +## References + +https://huggingface.co/cardiffnlp/twitter-xlm-roberta-base-sentiment + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-valueeval24_bert_baseline_english_en.md b/docs/_posts/ahmedlone127/2024-09-25-valueeval24_bert_baseline_english_en.md new file mode 100644 index 00000000000000..6be7a376a61ad3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-valueeval24_bert_baseline_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English valueeval24_bert_baseline_english BertForSequenceClassification from JohannesKiesel +author: John Snow Labs +name: valueeval24_bert_baseline_english +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, bert] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`valueeval24_bert_baseline_english` is a English model originally trained by JohannesKiesel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/valueeval24_bert_baseline_english_en_5.5.0_3.0_1727284648180.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/valueeval24_bert_baseline_english_en_5.5.0_3.0_1727284648180.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = BertForSequenceClassification.pretrained("valueeval24_bert_baseline_english","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = BertForSequenceClassification.pretrained("valueeval24_bert_baseline_english", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|valueeval24_bert_baseline_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|409.5 MB| + +## References + +https://huggingface.co/JohannesKiesel/valueeval24-bert-baseline-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-wb_charcs_extraction_en.md b/docs/_posts/ahmedlone127/2024-09-25-wb_charcs_extraction_en.md new file mode 100644 index 00000000000000..46a5d83704b7fb --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-wb_charcs_extraction_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English wb_charcs_extraction BertForTokenClassification from vkimbris +author: John Snow Labs +name: wb_charcs_extraction +date: 2024-09-25 +tags: [en, open_source, onnx, token_classification, bert, ner] +task: Named Entity Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: BertForTokenClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForTokenClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`wb_charcs_extraction` is a English model originally trained by vkimbris. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/wb_charcs_extraction_en_5.5.0_3.0_1727271968225.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/wb_charcs_extraction_en_5.5.0_3.0_1727271968225.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +tokenClassifier = BertForTokenClassification.pretrained("wb_charcs_extraction","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("ner") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, tokenClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val tokenClassifier = BertForTokenClassification.pretrained("wb_charcs_extraction", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("ner") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, tokenClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|wb_charcs_extraction| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[ner]| +|Language:|en| +|Size:|661.9 MB| + +## References + +https://huggingface.co/vkimbris/wb-charcs-extraction \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_dutch_nl.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_dutch_nl.md new file mode 100644 index 00000000000000..14cbde5ffa6927 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_dutch_nl.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Dutch, Flemish whisper_dutch WhisperForCTC from hannatoenbreker +author: John Snow Labs +name: whisper_dutch +date: 2024-09-25 +tags: [nl, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_dutch` is a Dutch, Flemish model originally trained by hannatoenbreker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_dutch_nl_5.5.0_3.0_1727226938874.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_dutch_nl_5.5.0_3.0_1727226938874.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_dutch","nl") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_dutch", "nl") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_dutch| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|nl| +|Size:|1.7 GB| + +## References + +https://huggingface.co/hannatoenbreker/whisper-dutch \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_dutch_pipeline_nl.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_dutch_pipeline_nl.md new file mode 100644 index 00000000000000..1b1537bdb3b6ac --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_dutch_pipeline_nl.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Dutch, Flemish whisper_dutch_pipeline pipeline WhisperForCTC from hannatoenbreker +author: John Snow Labs +name: whisper_dutch_pipeline +date: 2024-09-25 +tags: [nl, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_dutch_pipeline` is a Dutch, Flemish model originally trained by hannatoenbreker. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_dutch_pipeline_nl_5.5.0_3.0_1727227027051.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_dutch_pipeline_nl_5.5.0_3.0_1727227027051.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_dutch_pipeline", lang = "nl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_dutch_pipeline", lang = "nl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_dutch_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|nl| +|Size:|1.7 GB| + +## References + +https://huggingface.co/hannatoenbreker/whisper-dutch + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_small_inbrowser_proctor_en.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_inbrowser_proctor_en.md new file mode 100644 index 00000000000000..571d31583b59ab --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_inbrowser_proctor_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_inbrowser_proctor WhisperForCTC from lord-reso +author: John Snow Labs +name: whisper_small_inbrowser_proctor +date: 2024-09-25 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_inbrowser_proctor` is a English model originally trained by lord-reso. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_inbrowser_proctor_en_5.5.0_3.0_1727226765800.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_inbrowser_proctor_en_5.5.0_3.0_1727226765800.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_inbrowser_proctor","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_inbrowser_proctor", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_inbrowser_proctor| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/lord-reso/whisper-small-inbrowser-proctor \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_small_inbrowser_proctor_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_inbrowser_proctor_pipeline_en.md new file mode 100644 index 00000000000000..1beab7a3038901 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_inbrowser_proctor_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_inbrowser_proctor_pipeline pipeline WhisperForCTC from lord-reso +author: John Snow Labs +name: whisper_small_inbrowser_proctor_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_inbrowser_proctor_pipeline` is a English model originally trained by lord-reso. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_inbrowser_proctor_pipeline_en_5.5.0_3.0_1727226850641.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_inbrowser_proctor_pipeline_en_5.5.0_3.0_1727226850641.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_inbrowser_proctor_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_inbrowser_proctor_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_inbrowser_proctor_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/lord-reso/whisper-small-inbrowser-proctor + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_small_khmer_km.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_khmer_km.md new file mode 100644 index 00000000000000..d44045a4bb35e4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_khmer_km.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Central Khmer, Khmer whisper_small_khmer WhisperForCTC from seanghay +author: John Snow Labs +name: whisper_small_khmer +date: 2024-09-25 +tags: [km, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: km +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_khmer` is a Central Khmer, Khmer model originally trained by seanghay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_khmer_km_5.5.0_3.0_1727224018090.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_khmer_km_5.5.0_3.0_1727224018090.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_khmer","km") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_khmer", "km") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_khmer| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|km| +|Size:|1.7 GB| + +## References + +https://huggingface.co/seanghay/whisper-small-khmer \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_small_khmer_pipeline_km.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_khmer_pipeline_km.md new file mode 100644 index 00000000000000..cf10a7862e11f3 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_khmer_pipeline_km.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Central Khmer, Khmer whisper_small_khmer_pipeline pipeline WhisperForCTC from seanghay +author: John Snow Labs +name: whisper_small_khmer_pipeline +date: 2024-09-25 +tags: [km, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: km +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_khmer_pipeline` is a Central Khmer, Khmer model originally trained by seanghay. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_khmer_pipeline_km_5.5.0_3.0_1727224111359.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_khmer_pipeline_km_5.5.0_3.0_1727224111359.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_khmer_pipeline", lang = "km") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_khmer_pipeline", lang = "km") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_khmer_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|km| +|Size:|1.7 GB| + +## References + +https://huggingface.co/seanghay/whisper-small-khmer + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_jlondonobo_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_jlondonobo_pipeline_pt.md new file mode 100644 index 00000000000000..af97022d5e0f3c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_jlondonobo_pipeline_pt.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Portuguese whisper_small_portuguese_jlondonobo_pipeline pipeline WhisperForCTC from jlondonobo +author: John Snow Labs +name: whisper_small_portuguese_jlondonobo_pipeline +date: 2024-09-25 +tags: [pt, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_portuguese_jlondonobo_pipeline` is a Portuguese model originally trained by jlondonobo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_jlondonobo_pipeline_pt_5.5.0_3.0_1727224457904.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_jlondonobo_pipeline_pt_5.5.0_3.0_1727224457904.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_portuguese_jlondonobo_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_portuguese_jlondonobo_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_portuguese_jlondonobo_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/jlondonobo/whisper-small-pt + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_jlondonobo_pt.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_jlondonobo_pt.md new file mode 100644 index 00000000000000..e6f723ed1a4fe7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_jlondonobo_pt.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Portuguese whisper_small_portuguese_jlondonobo WhisperForCTC from jlondonobo +author: John Snow Labs +name: whisper_small_portuguese_jlondonobo +date: 2024-09-25 +tags: [pt, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_portuguese_jlondonobo` is a Portuguese model originally trained by jlondonobo. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_jlondonobo_pt_5.5.0_3.0_1727224365258.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_jlondonobo_pt_5.5.0_3.0_1727224365258.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_portuguese_jlondonobo","pt") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_portuguese_jlondonobo", "pt") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_portuguese_jlondonobo| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|pt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/jlondonobo/whisper-small-pt \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_pedropauletti_pipeline_pt.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_pedropauletti_pipeline_pt.md new file mode 100644 index 00000000000000..b12a03597bd2af --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_pedropauletti_pipeline_pt.md @@ -0,0 +1,69 @@ +--- +layout: model +title: Portuguese whisper_small_portuguese_pedropauletti_pipeline pipeline WhisperForCTC from pedropauletti +author: John Snow Labs +name: whisper_small_portuguese_pedropauletti_pipeline +date: 2024-09-25 +tags: [pt, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_portuguese_pedropauletti_pipeline` is a Portuguese model originally trained by pedropauletti. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_pedropauletti_pipeline_pt_5.5.0_3.0_1727228230723.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_pedropauletti_pipeline_pt_5.5.0_3.0_1727228230723.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_portuguese_pedropauletti_pipeline", lang = "pt") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_portuguese_pedropauletti_pipeline", lang = "pt") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_portuguese_pedropauletti_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|pt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/pedropauletti/whisper-small-pt + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_pedropauletti_pt.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_pedropauletti_pt.md new file mode 100644 index 00000000000000..ded2ecb95c7490 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_portuguese_pedropauletti_pt.md @@ -0,0 +1,84 @@ +--- +layout: model +title: Portuguese whisper_small_portuguese_pedropauletti WhisperForCTC from pedropauletti +author: John Snow Labs +name: whisper_small_portuguese_pedropauletti +date: 2024-09-25 +tags: [pt, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: pt +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_portuguese_pedropauletti` is a Portuguese model originally trained by pedropauletti. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_pedropauletti_pt_5.5.0_3.0_1727228145048.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_portuguese_pedropauletti_pt_5.5.0_3.0_1727228145048.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_portuguese_pedropauletti","pt") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_portuguese_pedropauletti", "pt") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_portuguese_pedropauletti| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|pt| +|Size:|1.7 GB| + +## References + +https://huggingface.co/pedropauletti/whisper-small-pt \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_small_turkish_istech_en.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_turkish_istech_en.md new file mode 100644 index 00000000000000..d8ad5d3076865b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_turkish_istech_en.md @@ -0,0 +1,84 @@ +--- +layout: model +title: English whisper_small_turkish_istech WhisperForCTC from muratsimsek003 +author: John Snow Labs +name: whisper_small_turkish_istech +date: 2024-09-25 +tags: [en, open_source, onnx, asr, whisper] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: WhisperForCTC +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_turkish_istech` is a English model originally trained by muratsimsek003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_istech_en_5.5.0_3.0_1727225876743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_istech_en_5.5.0_3.0_1727225876743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +audioAssembler = AudioAssembler() \ + .setInputCol("audio_content") \ + .setOutputCol("audio_assembler") + +speechToText = WhisperForCTC.pretrained("whisper_small_turkish_istech","en") \ + .setInputCols(["audio_assembler"]) \ + .setOutputCol("text") + +pipeline = Pipeline().setStages([audioAssembler, speechToText]) +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val audioAssembler = new DocumentAssembler() + .setInputCols("audio_content") + .setOutputCols("audio_assembler") + +val speechToText = WhisperForCTC.pretrained("whisper_small_turkish_istech", "en") + .setInputCols(Array("audio_assembler")) + .setOutputCol("text") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, speechToText)) +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_turkish_istech| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[audio_assembler]| +|Output Labels:|[text]| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/muratsimsek003/whisper-small-tr-istech \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_small_turkish_istech_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_turkish_istech_pipeline_en.md new file mode 100644 index 00000000000000..13af8e1cb7d73f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_small_turkish_istech_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_small_turkish_istech_pipeline pipeline WhisperForCTC from muratsimsek003 +author: John Snow Labs +name: whisper_small_turkish_istech_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_small_turkish_istech_pipeline` is a English model originally trained by muratsimsek003. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_istech_pipeline_en_5.5.0_3.0_1727225973702.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_small_turkish_istech_pipeline_en_5.5.0_3.0_1727225973702.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_small_turkish_istech_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_small_turkish_istech_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_small_turkish_istech_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|1.7 GB| + +## References + +https://huggingface.co/muratsimsek003/whisper-small-tr-istech + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-whisper_test_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-whisper_test_pipeline_en.md new file mode 100644 index 00000000000000..3173f244becd3b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-whisper_test_pipeline_en.md @@ -0,0 +1,69 @@ +--- +layout: model +title: English whisper_test_pipeline pipeline WhisperForCTC from SamagraDataGov +author: John Snow Labs +name: whisper_test_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Automatic Speech Recognition +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained WhisperForCTC, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`whisper_test_pipeline` is a English model originally trained by SamagraDataGov. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/whisper_test_pipeline_en_5.5.0_3.0_1727224748193.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/whisper_test_pipeline_en_5.5.0_3.0_1727224748193.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("whisper_test_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("whisper_test_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|whisper_test_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.0 MB| + +## References + +https://huggingface.co/SamagraDataGov/whisper-test + +## Included Models + +- AudioAssembler +- WhisperForCTC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-word2affect_dutch_pipeline_nl.md b/docs/_posts/ahmedlone127/2024-09-25-word2affect_dutch_pipeline_nl.md new file mode 100644 index 00000000000000..a153e6ed00f871 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-word2affect_dutch_pipeline_nl.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Dutch, Flemish word2affect_dutch_pipeline pipeline BertForSequenceClassification from hplisiecki +author: John Snow Labs +name: word2affect_dutch_pipeline +date: 2024-09-25 +tags: [nl, open_source, pipeline, onnx] +task: Text Classification +language: nl +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`word2affect_dutch_pipeline` is a Dutch, Flemish model originally trained by hplisiecki. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/word2affect_dutch_pipeline_nl_5.5.0_3.0_1727265918885.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/word2affect_dutch_pipeline_nl_5.5.0_3.0_1727265918885.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("word2affect_dutch_pipeline", lang = "nl") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("word2affect_dutch_pipeline", lang = "nl") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|word2affect_dutch_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|nl| +|Size:|409.3 MB| + +## References + +https://huggingface.co/hplisiecki/word2affect_dutch + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_final_vietnam_aug_insert_bert_2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_final_vietnam_aug_insert_bert_2_pipeline_en.md new file mode 100644 index 00000000000000..189b648b0bce41 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_final_vietnam_aug_insert_bert_2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_final_vietnam_aug_insert_bert_2_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_final_vietnam_aug_insert_bert_2_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_final_vietnam_aug_insert_bert_2_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_vietnam_aug_insert_bert_2_pipeline_en_5.5.0_3.0_1727228848305.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_vietnam_aug_insert_bert_2_pipeline_en_5.5.0_3.0_1727228848305.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_final_vietnam_aug_insert_bert_2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_final_vietnam_aug_insert_bert_2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_final_vietnam_aug_insert_bert_2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|794.2 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Final_VietNam-aug_insert_BERT-2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_final_vietnam_aug_replace_bert_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_final_vietnam_aug_replace_bert_pipeline_en.md new file mode 100644 index 00000000000000..f25c16826faa25 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_final_vietnam_aug_replace_bert_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_final_vietnam_aug_replace_bert_pipeline pipeline XlmRoBertaForSequenceClassification from ThuyNT03 +author: John Snow Labs +name: xlm_roberta_base_final_vietnam_aug_replace_bert_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_final_vietnam_aug_replace_bert_pipeline` is a English model originally trained by ThuyNT03. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_vietnam_aug_replace_bert_pipeline_en_5.5.0_3.0_1727229058216.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_final_vietnam_aug_replace_bert_pipeline_en_5.5.0_3.0_1727229058216.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_final_vietnam_aug_replace_bert_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_final_vietnam_aug_replace_bert_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_final_vietnam_aug_replace_bert_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|794.4 MB| + +## References + +https://huggingface.co/ThuyNT03/xlm-roberta-base-Final_VietNam-aug_replace_BERT + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_finetuned_marc_lewtun_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_finetuned_marc_lewtun_en.md new file mode 100644 index 00000000000000..67df13092b333c --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_finetuned_marc_lewtun_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_marc_lewtun XlmRoBertaForSequenceClassification from lewtun +author: John Snow Labs +name: xlm_roberta_base_finetuned_marc_lewtun +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_marc_lewtun` is a English model originally trained by lewtun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_lewtun_en_5.5.0_3.0_1727228952110.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_lewtun_en_5.5.0_3.0_1727228952110.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_marc_lewtun","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_marc_lewtun", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_marc_lewtun| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|846.8 MB| + +## References + +https://huggingface.co/lewtun/xlm-roberta-base-finetuned-marc \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_finetuned_marc_lewtun_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_finetuned_marc_lewtun_pipeline_en.md new file mode 100644 index 00000000000000..21bfed335616b4 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_finetuned_marc_lewtun_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_marc_lewtun_pipeline pipeline XlmRoBertaForSequenceClassification from lewtun +author: John Snow Labs +name: xlm_roberta_base_finetuned_marc_lewtun_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_marc_lewtun_pipeline` is a English model originally trained by lewtun. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_lewtun_pipeline_en_5.5.0_3.0_1727229035417.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_marc_lewtun_pipeline_en_5.5.0_3.0_1727229035417.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_finetuned_marc_lewtun_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_finetuned_marc_lewtun_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_marc_lewtun_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|846.8 MB| + +## References + +https://huggingface.co/lewtun/xlm-roberta-base-finetuned-marc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_finetuned_misogyny_sexism_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_finetuned_misogyny_sexism_en.md new file mode 100644 index 00000000000000..bc42bc1e4cba89 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_finetuned_misogyny_sexism_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_finetuned_misogyny_sexism XlmRoBertaForSequenceClassification from annahaz +author: John Snow Labs +name: xlm_roberta_base_finetuned_misogyny_sexism +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_finetuned_misogyny_sexism` is a English model originally trained by annahaz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_misogyny_sexism_en_5.5.0_3.0_1727229178558.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_finetuned_misogyny_sexism_en_5.5.0_3.0_1727229178558.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_misogyny_sexism","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_finetuned_misogyny_sexism", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_finetuned_misogyny_sexism| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|877.4 MB| + +## References + +https://huggingface.co/annahaz/xlm-roberta-base-finetuned-misogyny-sexism \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_germeval21_toxic_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_germeval21_toxic_en.md new file mode 100644 index 00000000000000..f1adcb1733557b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_germeval21_toxic_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_germeval21_toxic XlmRoBertaForSequenceClassification from airKlizz +author: John Snow Labs +name: xlm_roberta_base_germeval21_toxic +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_germeval21_toxic` is a English model originally trained by airKlizz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_germeval21_toxic_en_5.5.0_3.0_1727228708588.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_germeval21_toxic_en_5.5.0_3.0_1727228708588.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_germeval21_toxic","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_germeval21_toxic", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_germeval21_toxic| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|784.9 MB| + +## References + +https://huggingface.co/airKlizz/xlm-roberta-base-germeval21-toxic \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_germeval21_toxic_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_germeval21_toxic_pipeline_en.md new file mode 100644 index 00000000000000..fc1a1fffff613f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_germeval21_toxic_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_germeval21_toxic_pipeline pipeline XlmRoBertaForSequenceClassification from airKlizz +author: John Snow Labs +name: xlm_roberta_base_germeval21_toxic_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_germeval21_toxic_pipeline` is a English model originally trained by airKlizz. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_germeval21_toxic_pipeline_en_5.5.0_3.0_1727228858238.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_germeval21_toxic_pipeline_en_5.5.0_3.0_1727228858238.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_germeval21_toxic_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_germeval21_toxic_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_germeval21_toxic_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|784.9 MB| + +## References + +https://huggingface.co/airKlizz/xlm-roberta-base-germeval21-toxic + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_irumozhi_pipeline_ta.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_irumozhi_pipeline_ta.md new file mode 100644 index 00000000000000..84bdb91f16c7fa --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_irumozhi_pipeline_ta.md @@ -0,0 +1,70 @@ +--- +layout: model +title: Tamil xlm_roberta_base_irumozhi_pipeline pipeline XlmRoBertaForSequenceClassification from aryaman +author: John Snow Labs +name: xlm_roberta_base_irumozhi_pipeline +date: 2024-09-25 +tags: [ta, open_source, pipeline, onnx] +task: Text Classification +language: ta +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_irumozhi_pipeline` is a Tamil model originally trained by aryaman. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_irumozhi_pipeline_ta_5.5.0_3.0_1727229375117.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_irumozhi_pipeline_ta_5.5.0_3.0_1727229375117.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_irumozhi_pipeline", lang = "ta") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_irumozhi_pipeline", lang = "ta") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_irumozhi_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|ta| +|Size:|773.9 MB| + +## References + +https://huggingface.co/aryaman/xlm-roberta-base-irumozhi + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_en.md new file mode 100644 index 00000000000000..d25974525f95f5 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_en_5.5.0_3.0_1727228876329.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_en_5.5.0_3.0_1727228876329.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|832.3 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr0.001_seed42_amh-esp-eng_train \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_pipeline_en.md new file mode 100644 index 00000000000000..44e126a969b97b --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_pipeline pipeline XlmRoBertaForSequenceClassification from shanhy +author: John Snow Labs +name: xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_pipeline` is a English model originally trained by shanhy. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_pipeline_en_5.5.0_3.0_1727228956743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_pipeline_en_5.5.0_3.0_1727228956743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_lr0_001_seed42_amh_esp_eng_train_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|832.3 MB| + +## References + +https://huggingface.co/shanhy/xlm-roberta-base_lr0.001_seed42_amh-esp-eng_train + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_sentiment_classification_test_v2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_sentiment_classification_test_v2_pipeline_en.md new file mode 100644 index 00000000000000..23d29821e92d4f --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_sentiment_classification_test_v2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_sentiment_classification_test_v2_pipeline pipeline BertForSequenceClassification from pnr-svc +author: John Snow Labs +name: xlm_roberta_base_sentiment_classification_test_v2_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_sentiment_classification_test_v2_pipeline` is a English model originally trained by pnr-svc. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_sentiment_classification_test_v2_pipeline_en_5.5.0_3.0_1727266531365.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_sentiment_classification_test_v2_pipeline_en_5.5.0_3.0_1727266531365.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_sentiment_classification_test_v2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_sentiment_classification_test_v2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_sentiment_classification_test_v2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|414.5 MB| + +## References + +https://huggingface.co/pnr-svc/xlm-roberta-base-sentiment-classification_test_V2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_sst2_10_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_sst2_10_en.md new file mode 100644 index 00000000000000..d6cb2b93e47f3d --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_sst2_10_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_sst2_10 XlmRoBertaForSequenceClassification from tmnam20 +author: John Snow Labs +name: xlm_roberta_base_sst2_10 +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_sst2_10` is a English model originally trained by tmnam20. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_sst2_10_en_5.5.0_3.0_1727229679712.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_sst2_10_en_5.5.0_3.0_1727229679712.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_sst2_10","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_sst2_10", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_sst2_10| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|779.4 MB| + +## References + +https://huggingface.co/tmnam20/xlm-roberta-base-sst2-10 \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_en.md new file mode 100644 index 00000000000000..2851ffe0f4c172 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_en_5.5.0_3.0_1727228558441.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_en_5.5.0_3.0_1727228558441.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|390.0 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-en-30000-tweet-sentiment-en \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_pipeline_en.md new file mode 100644 index 00000000000000..3b1a9c0f206952 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_pipeline pipeline XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_pipeline` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_pipeline_en_5.5.0_3.0_1727228586507.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_pipeline_en_5.5.0_3.0_1727228586507.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_english_30000_tweet_sentiment_english_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|390.0 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-en-30000-tweet-sentiment-en + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_trimmed_italian_30000_tweet_sentiment_italian_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_trimmed_italian_30000_tweet_sentiment_italian_en.md new file mode 100644 index 00000000000000..c5130f4a59b646 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlm_roberta_base_trimmed_italian_30000_tweet_sentiment_italian_en.md @@ -0,0 +1,94 @@ +--- +layout: model +title: English xlm_roberta_base_trimmed_italian_30000_tweet_sentiment_italian XlmRoBertaForSequenceClassification from vocabtrimmer +author: John Snow Labs +name: xlm_roberta_base_trimmed_italian_30000_tweet_sentiment_italian +date: 2024-09-25 +tags: [en, open_source, onnx, sequence_classification, xlm_roberta] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlm_roberta_base_trimmed_italian_30000_tweet_sentiment_italian` is a English model originally trained by vocabtrimmer. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_italian_30000_tweet_sentiment_italian_en_5.5.0_3.0_1727228710743.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlm_roberta_base_trimmed_italian_30000_tweet_sentiment_italian_en_5.5.0_3.0_1727228710743.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +documentAssembler = DocumentAssembler() \ + .setInputCol('text') \ + .setOutputCol('document') + +tokenizer = Tokenizer() \ + .setInputCols(['document']) \ + .setOutputCol('token') + +sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_italian_30000_tweet_sentiment_italian","en") \ + .setInputCols(["documents","token"]) \ + .setOutputCol("class") + +pipeline = Pipeline().setStages([documentAssembler, tokenizer, sequenceClassifier]) +data = spark.createDataFrame([["I love spark-nlp"]]).toDF("text") +pipelineModel = pipeline.fit(data) +pipelineDF = pipelineModel.transform(data) + +``` +```scala + +val documentAssembler = new DocumentAssembler() + .setInputCols("text") + .setOutputCols("document") + +val tokenizer = new Tokenizer() + .setInputCols(Array("document")) + .setOutputCol("token") + +val sequenceClassifier = XlmRoBertaForSequenceClassification.pretrained("xlm_roberta_base_trimmed_italian_30000_tweet_sentiment_italian", "en") + .setInputCols(Array("documents","token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, sequenceClassifier)) +val data = Seq("I love spark-nlp").toDS.toDF("text") +val pipelineModel = pipeline.fit(data) +val pipelineDF = pipelineModel.transform(data) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlm_roberta_base_trimmed_italian_30000_tweet_sentiment_italian| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|388.9 MB| + +## References + +https://huggingface.co/vocabtrimmer/xlm-roberta-base-trimmed-it-30000-tweet-sentiment-it \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlmroberta_classifier_base_mrpc_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlmroberta_classifier_base_mrpc_en.md new file mode 100644 index 00000000000000..be0311740fd4fd --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlmroberta_classifier_base_mrpc_en.md @@ -0,0 +1,105 @@ +--- +layout: model +title: English XlmRobertaForSequenceClassification Base Cased model (from Intel) +author: John Snow Labs +name: xlmroberta_classifier_base_mrpc +date: 2024-09-25 +tags: [en, open_source, xlm_roberta, sequence_classification, classification, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +engine: onnx +annotator: XlmRoBertaForSequenceClassification +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRobertaForSequenceClassification model, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP. `xlm-roberta-base-mrpc` is a English model originally trained by `Intel`. + +## Predicted Entities + +`equivalent`, `not_equivalent` + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_classifier_base_mrpc_en_5.5.0_3.0_1727229909505.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_classifier_base_mrpc_en_5.5.0_3.0_1727229909505.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python +documentAssembler = DocumentAssembler() \ + .setInputCol("text") \ + .setOutputCol("document") + +tokenizer = Tokenizer() \ + .setInputCols("document") \ + .setOutputCol("token") + +seq_classifier = XlmRoBertaForSequenceClassification.pretrained("xlmroberta_classifier_base_mrpc","en") \ + .setInputCols(["document", "token"]) \ + .setOutputCol("class") + +pipeline = Pipeline(stages=[documentAssembler, tokenizer, seq_classifier]) + +data = spark.createDataFrame([["PUT YOUR STRING HERE"]]).toDF("text") + +result = pipeline.fit(data).transform(data) +``` +```scala +val documentAssembler = new DocumentAssembler() + .setInputCols(Array("text")) + .setOutputCols(Array("document")) + +val tokenizer = new Tokenizer() + .setInputCols("document") + .setOutputCol("token") + +val seq_classifier = XlmRoBertaForSequenceClassification.pretrained("xlmroberta_classifier_base_mrpc","en") + .setInputCols(Array("document", "token")) + .setOutputCol("class") + +val pipeline = new Pipeline().setStages(Array(documentAssembler, tokenizer, seq_classifier)) + +val data = Seq("PUT YOUR STRING HERE").toDS.toDF("text") + +val result = pipeline.fit(data).transform(data) +``` + +{:.nlu-block} +```python +import nlu +nlu.load("en.classify.xlmr_roberta.glue.base").predict("""PUT YOUR STRING HERE""") +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_classifier_base_mrpc| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Input Labels:|[document, token]| +|Output Labels:|[class]| +|Language:|en| +|Size:|787.7 MB| + +## References + +References + +- https://huggingface.co/Intel/xlm-roberta-base-mrpc +- https://paperswithcode.com/sota?task=Text+Classification&dataset=GLUE+MRPC \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-xlmroberta_classifier_base_mrpc_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-xlmroberta_classifier_base_mrpc_pipeline_en.md new file mode 100644 index 00000000000000..c21f16728e31de --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-xlmroberta_classifier_base_mrpc_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English xlmroberta_classifier_base_mrpc_pipeline pipeline XlmRoBertaForSequenceClassification from Intel +author: John Snow Labs +name: xlmroberta_classifier_base_mrpc_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained XlmRoBertaForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`xlmroberta_classifier_base_mrpc_pipeline` is a English model originally trained by Intel. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/xlmroberta_classifier_base_mrpc_pipeline_en_5.5.0_3.0_1727230042885.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/xlmroberta_classifier_base_mrpc_pipeline_en_5.5.0_3.0_1727230042885.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("xlmroberta_classifier_base_mrpc_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("xlmroberta_classifier_base_mrpc_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|xlmroberta_classifier_base_mrpc_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|787.8 MB| + +## References + +https://huggingface.co/Intel/xlm-roberta-base-mrpc + +## Included Models + +- DocumentAssembler +- TokenizerModel +- XlmRoBertaForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-yahoo1_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-yahoo1_pipeline_en.md new file mode 100644 index 00000000000000..9bfc9ab142eec7 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-yahoo1_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English yahoo1_pipeline pipeline BertForSequenceClassification from Lumos +author: John Snow Labs +name: yahoo1_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`yahoo1_pipeline` is a English model originally trained by Lumos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/yahoo1_pipeline_en_5.5.0_3.0_1727272725736.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/yahoo1_pipeline_en_5.5.0_3.0_1727272725736.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("yahoo1_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("yahoo1_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|yahoo1_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Lumos/yahoo1 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file diff --git a/docs/_posts/ahmedlone127/2024-09-25-yahoo2_pipeline_en.md b/docs/_posts/ahmedlone127/2024-09-25-yahoo2_pipeline_en.md new file mode 100644 index 00000000000000..1b3a98717af3e9 --- /dev/null +++ b/docs/_posts/ahmedlone127/2024-09-25-yahoo2_pipeline_en.md @@ -0,0 +1,70 @@ +--- +layout: model +title: English yahoo2_pipeline pipeline BertForSequenceClassification from Lumos +author: John Snow Labs +name: yahoo2_pipeline +date: 2024-09-25 +tags: [en, open_source, pipeline, onnx] +task: Text Classification +language: en +edition: Spark NLP 5.5.0 +spark_version: 3.0 +supported: true +annotator: PipelineModel +article_header: + type: cover +use_language_switcher: "Python-Scala-Java" +--- + +## Description + +Pretrained BertForSequenceClassification, adapted from Hugging Face and curated to provide scalability and production-readiness using Spark NLP.`yahoo2_pipeline` is a English model originally trained by Lumos. + +{:.btn-box} + + +[Download](https://s3.amazonaws.com/auxdata.johnsnowlabs.com/public/models/yahoo2_pipeline_en_5.5.0_3.0_1727287225807.zip){:.button.button-orange.button-orange-trans.arr.button-icon} +[Copy S3 URI](s3://auxdata.johnsnowlabs.com/public/models/yahoo2_pipeline_en_5.5.0_3.0_1727287225807.zip){:.button.button-orange.button-orange-trans.button-icon.button-copy-s3} + +## How to use + + + +
+{% include programmingLanguageSelectScalaPythonNLU.html %} +```python + +pipeline = PretrainedPipeline("yahoo2_pipeline", lang = "en") +annotations = pipeline.transform(df) + +``` +```scala + +val pipeline = new PretrainedPipeline("yahoo2_pipeline", lang = "en") +val annotations = pipeline.transform(df) + +``` +
+ +{:.model-param} +## Model Information + +{:.table-model} +|---|---| +|Model Name:|yahoo2_pipeline| +|Type:|pipeline| +|Compatibility:|Spark NLP 5.5.0+| +|License:|Open Source| +|Edition:|Official| +|Language:|en| +|Size:|409.4 MB| + +## References + +https://huggingface.co/Lumos/yahoo2 + +## Included Models + +- DocumentAssembler +- TokenizerModel +- BertForSequenceClassification \ No newline at end of file